space and games

March 4, 2009

A Stronger Type of Equilibrium

Filed under: Decision Theory, Voting — Peter de Blanc @ 5:57 pm

In game theory, a Nash equilibrium is an assignment of strategies to players, such that no player can benefit from changing vis own strategy (assuming the other players’ strategies remain the same). For example, in the one-shot Prisoner’s Dilemma, mutual defection is the only Nash equilibrium. There are several other ideas for how to define an “equilibrium” of a game, many of which are “refinements” of Nash Equilibrium, in the sense that the set of equilibria is smaller.

Here’s another one: a “cabal equilibrium” is an assignment of strategies to players, such that no subset (or “cabal”) K of players can simultaneously change strategies in such a way that some players in K benefit and no players in K are worse off. In other words, each cabal of players is playing a strategy which is Pareto optimal for the players within that cabal. Note that any subset of players is considered a cabal, so if there are N players, then there are 2N cabals.

Any cabal equilibrium is also a Nash equilibrium. A single player is just a cabal of size one, so if no cabal of players wishes to change strategies, then single no player wishes to change strategy. So this definition of equilibrium is “stronger” than the Nash equilibrium, in the sense that it is a more stringent test.

For example, the Prisoner’s Dilemma has no cabal equilibrium. First observe that mutual defection is the only Nash equilibrium, so there is at most one cabal equilibrium. But mutual defection is not an cabal equilibrium, because both players would prefer to simultaneously switch their strategy to cooperation.

Stag hunt has two Nash equilibria but only one cabal equilibrium. Battle of the Sexes has two Nash equilibria, both of which are cabal equilibria.

Mike Dobbins and I talked about cabal equilibria in voting. Because the outcome of an election depends on the voters in such a coarse way, the Nash equilibrium does not seem like a strong enough definition to apply to voting situations. For example, any election in which the winner beats the first loser by more than one vote is a Nash equilibrium, because no single voter can change the outcome by changing vis strategy. So not only are there multiple Nash equilibria, but in fact almost any election is a Nash equilibrium.

So let’s consider cabal equilibria in a plurality or approval vote. Suppose there is an equilibrium, and this equilibrium elects candidate A.

Suppose that more than half of the voters preferred B over A. Then they could all vote for B (and disapprove of A in an approval vote), resulting in a better outcome from their point of view. Then the original election was not a cabal equilibrium: contradiction! So there is no candidate that is preferred over A by a majority.

So if we ignore the possibility that exactly half the voters prefer some other candidate over A, this would mean that A is a Condorcet winner!

Now let’s consider the converse. Suppose that there is no cabal equilibrium. Then, in particular, it is not a cabal equilibrium for everyone to vote for the same candidate A. Then there exists some cabal K of voters which could make a Pareto improvement by changing votes. In order for the new strategy to be a Pareto improvement, it must affect the outcome of the election somehow; thus it must cause some other candidate B to be elected instead of A. In order to change the outcome of the election, we must have changed at least half of the votes (since originally everyone was voting for A), so K must contain at least half of the voters. Since no member of K prefers B over A, we know A is not preferred over B by a majority, and thus is not a Condorcet winner.

So if there is no cabal equilibrium, then A is not a Condorcet winner. Since the same argument works for any other candidate (not just candidate A), this means that no cabal equilibrium implies no Condorcet winner. Taking the contrapositive, this means that in an election with a Condorcet winner, there is a cabal equilibrium!

So we’ve just proven a theorem: A plurality or approval vote with an odd number of voters has a cabal equilibrium iff there is a Condorcet winner, and a cabal equilibrium always elects the Condorcet winner.

January 20, 2009

Intensional vs. Extensional Goals

Filed under: General — Peter de Blanc @ 12:39 am

Two types of goals for an agent are intensional and extensional goals. An intensional goal can be defined in purely mathematical terms, while an extensional goal depends on the universe in which the agent finds itself.

Some examples of intensional goals:

  • Find a prime number at least 200 bits long.
  • Prove Fermat’s Last Theorem.
  • Unscramble a Rubik’s Cube.
  • Fill in a Sudoku puzzle.

Some examples of extensional goals:

  • Predict the orbit of Mercury.
  • Drive a car across the Mojave Desert.
  • Win a trivia game.
  • Earn at least $500.

If we were coding a Go AI, we could try to build it to achieve either an extensional or an intensional goal. The obvious extensional goal is “win the game.” One possible intensional goal is “output a move that a minimax player would output.” In both cases we would probably include some sort of time limit.

“Output a move that a minimax player would output” stands out from the other examples of intensional goals listed above. In all of the other examples, the agent can be sure that its output is correct before it returns, but if I tell you to “output a move that a minimax player would output,” I haven’t given you an implementable procedure for checking whether you’ve achieved the goal.

It’s not so hard to think of other intensional goals with this property. Instead of asking an agent for a proof of Fermat’s Last Theorem, I could ask it to output 1 if a proof exists, and 0 otherwise.

Let’s say that these two are examples of “the hard kind of intensional goal,” and the four listed at the top of the page are “the easy kind of intensional goal.” The “easy” one are not necessarily easier individually than the “hard” ones; it’s easier for me to output a 1 than to output a proof of Fermat’s Last Theorem. But the “easy” ones are easier to think about, and as a class they’re easier.

In fact, the “hard” intensional goals are so hard that, from a certain point of view, they include the extensional goals! Take any extensional goal, and replace all the unknown parts with a probability distribution (perhaps based on algorithmic complexity), and you have an intensional goal.

Well, that’s a bit of a cop-out, because I don’t actually know how to do that for any of the extensional goals I listed at the top of the page. But we can do it for any goal for which we have already coded a success-detector - a piece of code which can determine, after the fact, if we have achieved our goal. In the Go example, we can do this. The Go AI interacts with its opponent through a a predefined protocol in which the AI outputs its moves and the opponent inputs vis moves to the AI. The board state can be built up from the list of moves, and so after any hypothetical series of plays, the AI can determine whether the game is over, and who won.

So to reformulate “win the game” as an intensional goal, we can suppose that our opponent’s moves are generated by some unknown Turing machine drawn from a given probability distribution. We use the list of moves which have occurred so far to do a Bayesian update on this distribution. Then any possible policy for generating moves has a probability of winning, and we output the move recommended by the policy with the greatest winning probability.

This way we can specify a program (called AIXI) which, if run on a big enough computer, would output what we want to output. And then our goal can be intensionally defined as “output whatever AIXI would output.”

A minimax player is a tall order. AIXI is an even taller order. We can’t actually run these programs, but we want to output, in a reasonable amount of time, whatever they would output. This may require uncertain reasoning about mathematics.

December 9, 2008

What Makes a Hint Good?

Filed under: General — Peter de Blanc @ 1:40 am

Nick Hay, Marcello and I discussed this question a while ago: if you had a halting oracle, how could you use it to help you prove a theorem, such as the Riemann Hypothesis? Let’s say you are only allowed to ask one question; you get one bit of information.

Of course, you might simply ask if the Riemann Hypothesis is true. If you trust your oracle, this may be good enough for your purposes. But let’s suppose what we really care about is the proof. Maybe we want to win a prize for our proof, and they won’t accept a proof by oracle.

Clearly, a complete proof of the Riemann Hypothesis is longer than one bit, so we can’t ask the oracle to output the proof for us. So what can we do? Well, we could ask it for a hint. Maybe we have some conjecture (which we can call Conjecture A) that could be useful in proving the RH. But we don’t know if A is true, nor do we know if proving A would actually help us prove the RH. So we could ask the oracle if it’s worth our time to try to prove A.

What we really want to know is if we would save time by trying to prove A, and then trying to use A to prove the RH, compared to some other strategy. To figure out how much time it would take, we need some sort of model of how we write proofs. Here’s a really simple model: “mathematicians write proofs by brute force; they look at every possible proof, from short to long, until finding one which proves their goal hypothesis, at which point they halt.”

Let’s suppose this model is true (lol). Let’s define the “difficulty” of a theorem as the length of its shortest proof. If our proofs are written in binary, then to prove a theorem of difficulty D, it takes time t = O(2D).

Let’s say D(RH) is the difficulty of the Riemann Hypothesis, D(A) is the difficulty of Conjecture A, and D(RH | A) is the difficulty of proving the RH once we have proved A. Then a mathematician who attempted to prove A before proving the RH would take time tA,RH ~ 2D(A) + 2D(RH | A), while a mathematician who attempted to prove the RH directly would take time tRH ~ 2D(RH).

So we could ask the oracle:

True or False? 2D(A) + 2D(RH | A) < 2D(RH)

If the oracle returns True, then we would start thinking about how to prove A.

March 24, 2008

Ordinal Performance Games

Filed under: Decision Theory — Peter de Blanc @ 9:56 pm

There are games (such as racing or DDR) in which players compete with one or more opponents, but the opponents do not interact. Each player performs individually, and receives some score for vis performance, and the player with the best score wins. I’ll call these “ordinal performance games.”

One might naively think that the best strategy is to ignore one’s opponents, and think only of maximizing one’s own score. After all, the players never interact, so your score depends only on your performance, right? That’s true, but the object is not to maximize your own score; it is to have a higher score than your opponents, and this depends on your opponent’s score as well.

For example, suppose we have a game with three player types, and possible scores of 1 through 5.

  • Alf is consistently mediocre, always receiving a score of 3.
  • Beth has adopted a risky strategy; she receives a score of 4 with probability 0.6 and a score of 1 with probability 0.4.
  • Gam may succeed spectacularly, but usually plays poorly. He receives a score of 5 with probability 0.4 and a score of 2 with probability 0.6.

Suppose Alf and Beth play. Alf will always score 3. Beth has a probability of 0.6 of scoring 4, and thus winning, and a probability of 0.4 of scoring a 1, and thus losing. So p(Alf < Beth) = 0.6.

Suppose Beth and Gam play. Beth has a probability of 0.4 of scoring 1, in which case Gam is guaranteed to win. The other way for Gam to win is for Beth to score a 4 while Gam scores a 5; the probability of this is 0.4 * 0.6 = 0.24. Thus p(Beth < Gam) = 0.4 + 0.24 = 0.64.

Suppose Gam and Alf play. Gam can only win by scoring a 5, with probability 0.4. So p(Gam < Alf) = 0.6.

So Alf usually loses to Beth, who usually loses to Gam, who usually loses to Alf. This shows that your optimal strategy in an ordinal performance game can depend on your opponents.

January 9, 2008

Infinite Certainty

Filed under: General — Peter de Blanc @ 9:26 pm

On Overcoming Bias, Eli says:

I don’t think you could get up to 99.99% confidence for assertions like “53 is a prime number”. Yes, it seems likely, but by the time you tried to set up protocols that would let you assert 10,000 independent statements of this sort - that is, not just a set of statements about prime numbers, but a new protocol each time - you would fail more than once. Peter de Blanc has an amusing anecdote on this point, which he is welcome to retell in the comments.

Here’s the anecdote:

Conversation with squallmage at 2006-07-28 23:46:54 on OneTrueCalculus (aim)
(23:47:19) OneTrueCalculus: oi
(23:47:23) SquallMage: oi
(23:47:33) OneTrueCalculus: how likely would you rate it that 81,241 is prime?
(23:47:59) SquallMage: very
(23:48:03) OneTrueCalculus: you can go ahead and calculate it if you want, or just give me a probability
(23:49:03) OneTrueCalculus: obviously this would be the subjective sort of probability
(23:49:11) SquallMage: yes.
(23:50:30) OneTrueCalculus: well?
(23:50:35) OneTrueCalculus: or are you busy calculating?
(23:50:38) SquallMage: I said ‘very’.
(23:50:42) OneTrueCalculus: ah
(23:50:46) SquallMage: Quantify that if you must, I’m too tired to.
(23:50:55) OneTrueCalculus: ok
(23:51:10) OneTrueCalculus: I won’t quantify it for you, since I don’t know how you are calibrated
(23:51:22) OneTrueCalculus: okay, how about 7?
(23:51:46) SquallMage: the probability of it’s primacy, or as a quantification of ‘very’?
(23:51:55) OneTrueCalculus: the probability of 7 being prime
(23:52:02) SquallMage: 7 is prime.
(23:52:08) OneTrueCalculus: with probability 1?
(23:52:18) SquallMage: Yes.
(23:52:41) OneTrueCalculus: so will you accept this deal? If you ever find out that 7 is not prime, you will give me $100.
(23:53:10) SquallMage: Only if you explain to me in detail what brought you to propose that deal to me.
(23:53:37) OneTrueCalculus: I am trying to swindle you out of your cash and/or teach you a valuable lesson
(23:54:09) OneTrueCalculus: also, I am trying to figure out how overconfident people are
(23:55:42) SquallMage: Well. I would make a deal with you that I would give you $100 if it were ever proven to me that it was possible in base-ten to generate the quantity 7 by multiplying together any two integers other than 1 and 7.
(23:55:55) OneTrueCalculus: okay
(23:56:05) OneTrueCalculus: I am only talking about the standard natural numbers. No weird groups or anything
(23:56:20) OneTrueCalculus: base 10
(23:56:24) SquallMage: No, ‘7 and 1′ is not different than ‘1 and 7′ also.
(23:56:32) OneTrueCalculus: of course not
(23:56:39) SquallMage: Just checking.
(23:56:42) OneTrueCalculus: I wouldn’t use a cheap technicality like that
(23:56:44) SquallMage: Since I’m a language bastard.
(23:56:49) SquallMage: And I completely would.
(23:56:54) OneTrueCalculus: okay
(23:57:13) OneTrueCalculus: If you even honestly feel that it was a cheap technicality, I wouldn’t expect you to pay me.
(23:57:22) SquallMage: Anyways.
(23:57:23) OneTrueCalculus: Under those conditions, will you accept my offer?
(23:57:33) SquallMage: Yes.
(23:57:37) OneTrueCalculus: Okay.
(23:57:44) SquallMage: Tell me how I owe you $100 now
(23:57:48) OneTrueCalculus: you don’t
(23:57:52) SquallMage: Good.
(23:57:56) OneTrueCalculus: now
(23:58:03) OneTrueCalculus: will you accept the same offer, but for 11 this time?
(23:58:05) SquallMage: Not for 81241
(23:58:34) OneTrueCalculus: i.e. if you ever find out that 11 is not prime, you will give me $100
(23:58:49) SquallMage: I would make that deal under equivalent conditions
(23:59:00) OneTrueCalculus: okay. I’m asking you to make that deal.
(23:59:44) SquallMage: I did say that I would.
(00:00:00) OneTrueCalculus: okay, thanks
(00:00:04) OneTrueCalculus: how about 13?
(00:00:20) SquallMage: Yes.
(00:00:48) SquallMage: 17 also, and 19, and 23.
(00:00:54) OneTrueCalculus: thanks.
(00:00:57) OneTrueCalculus: What about 27?
(00:01:02) SquallMage: No thanks.
(00:01:05) OneTrueCalculus: 29?
(00:01:11) SquallMage: sure.
(00:01:13) OneTrueCalculus: 31?
(00:01:17) SquallMage: Yep.
(00:01:20) OneTrueCalculus: 33?
(00:01:24) SquallMage: Nah.
(00:01:26) OneTrueCalculus: 37?
(00:01:29) SquallMage: Yah.
(00:01:31) OneTrueCalculus: 39?
(00:01:35) SquallMage: Nah.
(00:01:37) OneTrueCalculus: 41?
(00:01:42) SquallMage: Yah.
(00:01:45) OneTrueCalculus: 43?
(00:01:51) SquallMage: Yah.
(00:01:54) OneTrueCalculus: 47?
(00:02:06) SquallMage: Yah.
(00:02:09) OneTrueCalculus: 49?
(00:02:17) SquallMage: Nah.
(00:02:20) OneTrueCalculus: 51?
(00:02:28) SquallMage: Yah.
(00:02:36) OneTrueCalculus: Thank you. I win.
(00:02:48) SquallMage: You know I’ve been up since this time yesterday.
(00:02:51) OneTrueCalculus: You can donate your money to the Singularity Institute for Artificial Intelligence
(00:03:06) SquallMage: I’ll forward you their message of receipt.

Expected Utilities Paper

Filed under: Decision Theory — Peter de Blanc @ 8:58 am

I still haven’t published it, but it’s now on the arXiv.

October 29, 2007

Athena’s Theorem

Filed under: General — Peter de Blanc @ 2:10 pm

In the Odyssey, Athena says to Telemachus:

It’s true few men are like their fathers. Most of them are worse. Only very few of them are better.

Athena was pretty sharp! Of course, “better” and “worse” are rather vague terms, but we can talk more precisely if we consider reproductive fitness, defined as the number of offspring one has. Then we can restate Athena’s theorem as:

The expected fitness of a random organism is ≤ the expected fitness of its parent.

Intuitively, one can reason as follows: the expected fitness of a random organism is the average fitness of the population. But the expected fitness of its parent is above average, because we already know that the parent has at least one child.

A rigorous proof is left as an exercise to the reader.

October 24, 2007

Colonel Blotto’s Game

Filed under: General — Peter de Blanc @ 4:54 am

Zhan Shen, a fellow student, mentioned this really cool game last week in his practice thesis defense on insurance theory. Zhan talked a bit about applications of an asymmetric version of the game (with different army sizes). His defense is today.

October 20, 2007

Convergence of Expected Utilities

Filed under: Decision Theory — Peter de Blanc @ 8:44 am

I’m working on a paper on this topic, but I figured I would post about it now and clear up some confusion.

Let’s say we have an agent interacting with an environment. The agent sends the environment some number (an action), and the environment sends the agent some number in response (a perception).

When we have an infinite set of hypotheses, the expected utility of some action is determined by an infinite series. It’s important for that series to converge, or else we won’t (in general) be able to compare the expected utilities of two different actions.

Given the following assumptions:

  • We have some finite set of data about the environment.
  • We assign a nonzero probability to all computable hypotheses consistent with our data.
  • Either this probability distribution is computable, or it is bounded below by some positive computable function.
  • Our utility function is determined by our perceptions, and is unbounded.
  • Either this utility function is computable, or it is bounded below in absolute value by some unbounded computable function.

…then it follows that our expected utilities do not converge.

This seems like a pretty strong reason to stick with bounded utility functions. It certainly seems much more plausible to me now than it did before that my utility function is bounded (to the extent that I can be said to have a utility function).

But we shouldn’t jump to conclusions. Here are some reasons why the above might not apply:

  • You might have an uncomputable utility function.
  • You might have a utility function which is not determined solely by your perceptions.
  • You might assign a probability of zero to some hypotheses which are consistent with your data.

August 8, 2007

Board Evaluators and Move Suggesters

Filed under: Decision Theory — Peter de Blanc @ 9:10 pm

In Decision Theory, agents make decisions which maximize the expected value of a utility function. A utility function for a Chess or Go player might be: “1 if I win, -1 if I lose, 0 if I draw.”

Because we have to work with limited computational resources, we can’t search the entire game tree when we play games like Chess and Go. In this case, we would like a way to approximate the expected utility of a node in the game tree, using local properties of that node. I’m calling such an approximation a “board evaluator.” Board evaluators have been used with some success in Computer Chess.

A move suggester is something that looks at a situation and returns a list of moves, possibly with numerical values attached; higher values are preferred. We can sum the values returned by several different move suggesters, and select the move with the highest value. When a mosquito lands on your arm and you twitch, you may be using something like a move suggester.

More formally, we can define a move suggester as something which takes two adjacent nodes in the game tree and estimates the change in expected utility, compared to a board evaluator, which estimates the expected utility of a single node. So a board evaluator takes a node as its argument, while a move suggester takes an edge in the game tree.

Board evaluators have a major advantage over move suggesters: they allow us to use search algorithms such as minimax.

So it would be nice to have a way to build a board evaluator out of move suggesters. I know of three ways to do this:

  • The naive method, as I’ll call it, considers the sequence of moves used to reach the position. We evaluate each move, and sum these values to estimate the value of the current position. Because we are adding up many terms, we will usually accumulate a significant error.
  • Playout Analysis uses move suggesters to continue play from a given position until the game is finished. Then we determine who won the game. We can perform many randomized iterations of this analysis, and take some sort of average; this average is then used as the value of the original position.
  • Tewari, developed by Honinbo Dosaku in the 17th century, works by permuting the sequence of moves used to reach a board position. The naive method is applied to each such permutation, and we average these values.

I’d be interested in examples of these methods (particularly Tewari) being used to analyze real-world decision problems.

Next Page »

Powered by WordPress