space and games

August 31, 2009

Summer Research, Singularity Summit

Filed under: Decision Theory, General — Peter de Blanc @ 3:43 pm

This summer, I was involved in a summer research program at the Singularity Institute. Here we are:

SIAI Lunge

While I was there, I wrote a follow-up to my old Expected Utility paper. The new paper says basically the same thing as the old paper, but for repeated decisions rather than one-off decisions.

Roko Mijic and I have also started a paper about the problem of generalizing utility functions to new models – the sort of problem I call an “ontological crisis.” These situations arise for humans when we discover that the goals and values which we ascribe to ourselves do not correspond to objects in reality. Obvious examples include god, souls, and free will, but we’re just as interested in how AIs can deal with more mundane problems such as generalizing the notion of “temperature” from a classical to a quantum model. Unfortunately, we didn’t have time to finish the paper this summer, but you can expect to see it soon.

Towards the end of the summer, I made a few resolutions for the new year (as a grad student, my year starts in late August). In particular, I’ve resolved to write a popular blog. In the short term this will mean reducing the quality of my posts in exchange for much greater quantity, but in the long term I expect quality to rise again as I gain more experience writing. I’ll probably write about some of my less serious projects, such as the computer game I’ve been developing on and off for the past year.

In other news, the Singularity Summit will be in New York this year, on October 3-4. Anyone who wants to chat me up can do so at the summit, if you can find me among the horde of attendees. See you there!

March 4, 2009

A Stronger Type of Equilibrium

Filed under: Decision Theory, Voting — Peter de Blanc @ 5:57 pm

In game theory, a Nash equilibrium is an assignment of strategies to players, such that no player can benefit from changing vis own strategy (assuming the other players’ strategies remain the same). For example, in the one-shot Prisoner’s Dilemma, mutual defection is the only Nash equilibrium. There are several other ideas for how to define an “equilibrium” of a game, many of which are “refinements” of Nash Equilibrium, in the sense that the set of equilibria is smaller.

Here’s another one: a “cabal equilibrium” is an assignment of strategies to players, such that no subset (or “cabal”) K of players can simultaneously change strategies in such a way that some players in K benefit and no players in K are worse off. In other words, each cabal of players is playing a strategy which is Pareto optimal for the players within that cabal. Note that any subset of players is considered a cabal, so if there are N players, then there are 2N cabals.

Any cabal equilibrium is also a Nash equilibrium. A single player is just a cabal of size one, so if no cabal of players wishes to change strategies, then single no player wishes to change strategy. So this definition of equilibrium is “stronger” than the Nash equilibrium, in the sense that it is a more stringent test.

For example, the Prisoner’s Dilemma has no cabal equilibrium. First observe that mutual defection is the only Nash equilibrium, so there is at most one cabal equilibrium. But mutual defection is not an cabal equilibrium, because both players would prefer to simultaneously switch their strategy to cooperation.

Stag hunt has two Nash equilibria but only one cabal equilibrium. Battle of the Sexes has two Nash equilibria, both of which are cabal equilibria.

Mike Dobbins and I talked about cabal equilibria in voting. Because the outcome of an election depends on the voters in such a coarse way, the Nash equilibrium does not seem like a strong enough definition to apply to voting situations. For example, any election in which the winner beats the first loser by more than one vote is a Nash equilibrium, because no single voter can change the outcome by changing vis strategy. So not only are there multiple Nash equilibria, but in fact almost any election is a Nash equilibrium.

So let’s consider cabal equilibria in a plurality or approval vote. Suppose there is an equilibrium, and this equilibrium elects candidate A.

Suppose that more than half of the voters preferred B over A. Then they could all vote for B (and disapprove of A in an approval vote), resulting in a better outcome from their point of view. Then the original election was not a cabal equilibrium: contradiction! So there is no candidate that is preferred over A by a majority.

So if we ignore the possibility that exactly half the voters prefer some other candidate over A, this would mean that A is a Condorcet winner!

Now let’s consider the converse. Suppose that there is no cabal equilibrium. Then, in particular, it is not a cabal equilibrium for everyone to vote for the same candidate A. Then there exists some cabal K of voters which could make a Pareto improvement by changing votes. In order for the new strategy to be a Pareto improvement, it must affect the outcome of the election somehow; thus it must cause some other candidate B to be elected instead of A. In order to change the outcome of the election, we must have changed at least half of the votes (since originally everyone was voting for A), so K must contain at least half of the voters. Since no member of K prefers B over A, we know A is not preferred over B by a majority, and thus is not a Condorcet winner.

So if there is no cabal equilibrium, then A is not a Condorcet winner. Since the same argument works for any other candidate (not just candidate A), this means that no cabal equilibrium implies no Condorcet winner. Taking the contrapositive, this means that in an election with a Condorcet winner, there is a cabal equilibrium!

So we’ve just proven a theorem: A plurality or approval vote with an odd number of voters has a cabal equilibrium iff there is a Condorcet winner, and a cabal equilibrium always elects the Condorcet winner.

March 24, 2008

Ordinal Performance Games

Filed under: Decision Theory — Peter de Blanc @ 9:56 pm

There are games (such as racing or DDR) in which players compete with one or more opponents, but the opponents do not interact. Each player performs individually, and receives some score for vis performance, and the player with the best score wins. I’ll call these “ordinal performance games.”

One might naively think that the best strategy is to ignore one’s opponents, and think only of maximizing one’s own score. After all, the players never interact, so your score depends only on your performance, right? That’s true, but the object is not to maximize your own score; it is to have a higher score than your opponents, and this depends on your opponent’s score as well.

For example, suppose we have a game with three player types, and possible scores of 1 through 5.

  • Alf is consistently mediocre, always receiving a score of 3.
  • Beth has adopted a risky strategy; she receives a score of 4 with probability 0.6 and a score of 1 with probability 0.4.
  • Gam may succeed spectacularly, but usually plays poorly. He receives a score of 5 with probability 0.4 and a score of 2 with probability 0.6.

Suppose Alf and Beth play. Alf will always score 3. Beth has a probability of 0.6 of scoring 4, and thus winning, and a probability of 0.4 of scoring a 1, and thus losing. So p(Alf < Beth) = 0.6.

Suppose Beth and Gam play. Beth has a probability of 0.4 of scoring 1, in which case Gam is guaranteed to win. The other way for Gam to win is for Beth to score a 4 while Gam scores a 5; the probability of this is 0.4 * 0.6 = 0.24. Thus p(Beth < Gam) = 0.4 + 0.24 = 0.64.

Suppose Gam and Alf play. Gam can only win by scoring a 5, with probability 0.4. So p(Gam < Alf) = 0.6.

So Alf usually loses to Beth, who usually loses to Gam, who usually loses to Alf. This shows that your optimal strategy in an ordinal performance game can depend on your opponents.

January 9, 2008

Expected Utilities Paper

Filed under: Decision Theory — Peter de Blanc @ 8:58 am

I still haven’t published it, but it’s now on the arXiv.

October 20, 2007

Convergence of Expected Utilities

Filed under: Decision Theory — Peter de Blanc @ 8:44 am

I’m working on a paper on this topic, but I figured I would post about it now and clear up some confusion.

Let’s say we have an agent interacting with an environment. The agent sends the environment some number (an action), and the environment sends the agent some number in response (a perception).

When we have an infinite set of hypotheses, the expected utility of some action is determined by an infinite series. It’s important for that series to converge, or else we won’t (in general) be able to compare the expected utilities of two different actions.

Given the following assumptions:

  • We have some finite set of data about the environment.
  • We assign a nonzero probability to all computable hypotheses consistent with our data.
  • Either this probability distribution is computable, or it is bounded below by some positive computable function.
  • Our utility function is determined by our perceptions, and is unbounded.
  • Either this utility function is computable, or it is bounded below in absolute value by some unbounded computable function.

…then it follows that our expected utilities do not converge.

This seems like a pretty strong reason to stick with bounded utility functions. It certainly seems much more plausible to me now than it did before that my utility function is bounded (to the extent that I can be said to have a utility function).

But we shouldn’t jump to conclusions. Here are some reasons why the above might not apply:

  • You might have an uncomputable utility function.
  • You might have a utility function which is not determined solely by your perceptions.
  • You might assign a probability of zero to some hypotheses which are consistent with your data.

August 8, 2007

Board Evaluators and Move Suggesters

Filed under: Decision Theory — Peter de Blanc @ 9:10 pm

In Decision Theory, agents make decisions which maximize the expected value of a utility function. A utility function for a Chess or Go player might be: “1 if I win, -1 if I lose, 0 if I draw.”

Because we have to work with limited computational resources, we can’t search the entire game tree when we play games like Chess and Go. In this case, we would like a way to approximate the expected utility of a node in the game tree, using local properties of that node. I’m calling such an approximation a “board evaluator.” Board evaluators have been used with some success in Computer Chess.

A move suggester is something that looks at a situation and returns a list of moves, possibly with numerical values attached; higher values are preferred. We can sum the values returned by several different move suggesters, and select the move with the highest value. When a mosquito lands on your arm and you twitch, you may be using something like a move suggester.

More formally, we can define a move suggester as something which takes two adjacent nodes in the game tree and estimates the change in expected utility, compared to a board evaluator, which estimates the expected utility of a single node. So a board evaluator takes a node as its argument, while a move suggester takes an edge in the game tree.

Board evaluators have a major advantage over move suggesters: they allow us to use search algorithms such as minimax.

So it would be nice to have a way to build a board evaluator out of move suggesters. I know of three ways to do this:

  • The naive method, as I’ll call it, considers the sequence of moves used to reach the position. We evaluate each move, and sum these values to estimate the value of the current position. Because we are adding up many terms, we will usually accumulate a significant error.
  • Playout Analysis uses move suggesters to continue play from a given position until the game is finished. Then we determine who won the game. We can perform many randomized iterations of this analysis, and take some sort of average; this average is then used as the value of the original position.
  • Tewari, developed by Honinbo Dosaku in the 17th century, works by permuting the sequence of moves used to reach a board position. The naive method is applied to each such permutation, and we average these values.

I’d be interested in examples of these methods (particularly Tewari) being used to analyze real-world decision problems.

September 12, 2006

Hay Voting

Filed under: Decision Theory, Voting — Peter de Blanc @ 2:10 am

Marcello Herreshoff and I talked a bit about voting theory during the Summer of AI, and we came up with our own voting method. In compliance with Stigler’s Law, I’ve decided to call it Hay Voting (after Nick Hay). Hay Voting is, I fear, somewhat impractical, but it has some interesting properties.

Hay Voting is a stochastic voting method; it does not always elect the same candidate given the same votes. Hay Voting is based on the idea of expected utility. It assumes that each voter possesses a utility function which assigns a numeric value to each candidate, and that the voter will vote in such a way as to maximize their expected value for this number. Our intention was to design a voting method such that, when voters follow their optimal strategy, we can deduce their utility functions. You might think of this as a voting method which encourages voters to be “honest.”

A Voting Method that Rewards Honesty

Before I get to Hay Voting, I’m going to talk a bit more about honesty and stochastic methods. Suppose we only wish to know the identity of each voter’s favorite candidate. Here we will give each voter a single vote, which they can cast in favor of any candidate. If we simply count the number of votes for each candidate, and elect the candidate with the most votes (this is called Plurality Voting, and is used by many countries), then we will fail to achieve our goal of discovering which candidate each voter most favors.

That’s because plurality voting introduces voting strategies. Let’s suppose that there is an election between three candidates: A, B, and C. A is really your favorite, but you know that A has no real chance of winning, and the race between B and C is very close. Of those, you prefer B, so voting for B would be a good strategy. So that’s what you do.

Suppose instead that I selected one vote at random. I put all the votes in a hat, pull one out, and elect whichever candidate is lists. If I select a vote other than yours, then your vote had no effect, and if I select your vote, then I elect that candidate. So your optimal strategy is to vote for your favorite candidate – A. So this voting method encourages you to be honest. Of course, all I found out about you was who your favorite candidate was. I don’t know anything else about your preferences.

A Ranked Voting Method that Rewards Honesty

Most voting methods that I know of (e.g. Condorcet Voting, the Borda Count, Instant Runoff Voting (which is awful)) ask each voter to rank each candidate from most to least favorite. All of the methods listed above encourage some strategy; that is, there exist situations in which a voter can achieve a better result by voting something other than vis true preferences. Such situations are supposed to be rare in Condorcet voting methods, but they exist.

It is easy to construct a ranked voting method which encourages voters to be honest – if we assume that the voters are ignorant of the value of a certain variable. Let’s call it X, and let’s say that there are N candidates. We need a probability distribution for X such that p(X=1) > p(X=2) > … > p(X=N).

Now we select a ballot from our hat, and then elect the candidate listed in place X on that ballot. Since that’s most likely to be the first candidate listed, and less likely to be each successive candidate after that, your best bet is to rank the candidates by preference.

The N-Substance Problem

Okay, now we’re almost ready to talk about Hay Voting, but there’s one more preliminary. It’s what I call the “N-substance problem.” Let’s say I’m a merchant, and I have a collection of N different substances which I can sell to you. You have a preference function over these substances, which you are going to try to maximize. I’m making a special assumption about this preference function: it’s a linear combination of the quantities of each substance that you have.

Let’s say Vk is the volume of substance k that you have, for k=1…n. Then your preference function U can be written as:

U = u1V1 + u2V2 + ... + unVn

We can call uk your utility density for substance k.

Let’s say you have 1.00 credit to spend on these substances, but this credit can be divided into arbitrarily small quantities. The N-substance problem is this: how can I devise a pricing scheme such that, based on what you buy, I can determine (up to a positive Affine Transformation) what your utility function is (that is, the values u1un)?

Not just any pricing scheme will work. For example, consider the obvious scheme in which you can buy one liter of any substance for 1 credit. If the substances are green tea, lemonade, and maple syrup, with utility densities 2, 3, and 20 respectively, your optimal strategy is to spend your entire credit on maple syrup, without buying any green tea or lemonade. But then I can’t find out the relative values you assign to green tea and lemonade, so this pricing scheme fails.

So now let’s specify our pricing scheme. We want a function f which determines the volume of substance you get for a given number of credits. And for any u1, u2, we want U = u1f(x1) + u2f(x2) to be maximized whennever f(x1) / f(x2) = u1 / u2. Furthermore, we will arbitrarily set f(1) = 1.

So let’s say we are spending a total of C credits on x1 and x2. Then we can let x2 = C−x1. Now if we differentiate U with respect to x1 (conveniently assuming f is differentiable), we get:

dU/dx1 = u1f'(x1) − u2f'(C−x1)

If U is at a maximum, then this derivative will be 0, so:

u1f'(x1) − u2f'(C−x1) = 0

u1f'(x1) = u2f'(C−x1)

u1 / u2 = f'(C−x1) / f'(x1)


f(x1) / f(C−x1) = u1 / u2

Combining these two equations, we get:

f(x1) / f(C−x1) = f'(C−x1) / f'(x1)

We set x1 = 1. Recall that we arbitrarily set f(1) = 1, so:

1 / f(C−1) = f'(C−1) / f'(1)

Letting x = C−1 and letting k = f ‘(1):

1 / f(x) = f'(x) / k

f'(x) = k / f(x)

So f(x) = √x.

Hay Voting

Now we have a pricing scheme which encourages you to purchase quantities of each substance which are proportional to your utility density for that substance – assuming it makes sense to talk about “utility density” (that is, assuming your preferences are linear). When does that assumption hold?

If you’re trying to maximize the expected value of some function, then your preferences are linear over the probabilities assigned to each outcome. So if I sell you probabilities using this pricing scheme, I can discover your utility function!

Under Hay Voting, you receive a ballot which has a √(N-1) units of “voting mass” (called votes) allocated to each candidate, where N is the number of candidates. You get 1.00 voting credit, which you may spend to reallocate some of this voting mass. If you spend x credits to trasnfer mass from candidate A to candidate B, then A loses √x votes and B gains √x votes. The √(N-1) value was selected so that you have just enough voting credit to completely eliminate one candidate from your ballot, or double the voting mass assigned to one candidate.

To elect a candidate, we gather all the ballots together and place them into a hat. One ballot is drawn at random, and then one point of voting mass is selected from that ballot, so that the probability of any candidate being selected is proportional to the amount of voting mass allocated to that candidate.

Now it makes sense to talk about the utility density of a transfer of voting mass from one candidate to another. This density is simply the difference between the utilities you assign to the two candidates. So if you buy these transfers in such a way as to maximize your expected utility, I can see relative sizes of the intervals between the utilities you assign to any two candidates – and that tells me everything there is to know about your utility function over candidates.

Deterministic Hay Voting…?

In Deterministic Hay Voting, the election procedure is as above, but the candidate with the most voting mass is always elected. It may still be possible (at least with idealized agents) to determine the utility functions of the voters by looking at their votes.

Why do we vote? In real life one may vote for any number of reasons: to make a statement, to do one’s patriotic duty, etc. In designing Hay Voting I made the unrealistic assumption that one votes only in order to affect the outcome of the election. I will continue to use that assumption to talk about deterministic voting.

Thus, one’s utility function over votes is dependent upon two things: one’s utility function over candidates, and one’s probability distribution for the effect of one’s vote on the result of the election. The latter can be written in terms of a probability distribution for the actions of the other voters.

As long as it makes sense to talk about the utility density of votes (as long as the size of your own vote is small relative to your uncertainty about how others will vote), we can discover your utility function over votes by using the square-root pricing rule. Then if we knew your probability distribution for the actions of the other voters, we would be able to find the set of all possible utility functions over candidates that would suggest the voting strategy you used.

To discover your probability distribution for the actions of the other voters, we can play a prediction game using a strictly proper scoring rule (you can read about strictly proper scoring rules here). Your reward in this game is based on the probability you assigned to the correct outcome; a strictly proper scoring rule is one such that your optimal strategy is to report your actual beliefs. In this case we would probably prefer a squared-error scoring rule because it is bounded (unlike the logarithm). The reward could be additional votes in the next election.

Deterministic Hay Voting is still a fairly sketchy idea. I’m not sure exactly what conditions the voters’ probability distributions must satisfy in order for it to work.

Again, I feel pretty uncertain about how clear my explanations were here, so please give me feedback and ask questions about anything unclear.

Thanks for reading!

Powered by WordPress