### Hay Voting

Marcello Herreshoff and I talked a bit about voting theory during the Summer of AI, and we came up with our own voting method. In compliance with Stigler’s Law, I’ve decided to call it Hay Voting (after Nick Hay). Hay Voting is, I fear, somewhat impractical, but it has some interesting properties.

Hay Voting is a stochastic voting method; it does not always elect the same candidate given the same votes. Hay Voting is based on the idea of expected utility. It assumes that each voter possesses a *utility function* which assigns a numeric value to each candidate, and that the voter will vote in such a way as to maximize their expected value for this number. Our intention was to design a voting method such that, when voters follow their optimal strategy, we can deduce their utility functions. You might think of this as a voting method which encourages voters to be “honest.”

## A Voting Method that Rewards Honesty

Before I get to Hay Voting, I’m going to talk a bit more about honesty and stochastic methods. Suppose we only wish to know the identity of each voter’s favorite candidate. Here we will give each voter a single vote, which they can cast in favor of any candidate. If we simply count the number of votes for each candidate, and elect the candidate with the most votes (this is called Plurality Voting, and is used by many countries), then we will fail to achieve our goal of discovering which candidate each voter most favors.

That’s because plurality voting introduces voting strategies. Let’s suppose that there is an election between three candidates: A, B, and C. A is really your favorite, but you know that A has no real chance of winning, and the race between B and C is very close. Of those, you prefer B, so voting for B would be a good strategy. So that’s what you do.

Suppose instead that I selected one vote at random. I put all the votes in a hat, pull one out, and elect whichever candidate is lists. If I select a vote other than yours, then your vote had no effect, and if I select your vote, then I elect that candidate. So your optimal strategy is to vote for your favorite candidate – A. So this voting method encourages you to be honest. Of course, all I found out about you was who your favorite candidate was. I don’t know anything else about your preferences.

## A Ranked Voting Method that Rewards Honesty

Most voting methods that I know of (e.g. Condorcet Voting, the Borda Count, Instant Runoff Voting (which is awful)) ask each voter to rank each candidate from most to least favorite. All of the methods listed above encourage some strategy; that is, there exist situations in which a voter can achieve a better result by voting something other than vis true preferences. Such situations are supposed to be rare in Condorcet voting methods, but they exist.

It is easy to construct a ranked voting method which encourages voters to be honest – *if* we assume that the voters are ignorant of the value of a certain variable. Let’s call it X, and let’s say that there are N candidates. We need a probability distribution for X such that p(X=1) > p(X=2) > … > p(X=N).

Now we select a ballot from our hat, and then elect the candidate listed in place X on that ballot. Since that’s most likely to be the first candidate listed, and less likely to be each successive candidate after that, your best bet is to rank the candidates by preference.

## The N-Substance Problem

Okay, now we’re almost ready to talk about Hay Voting, but there’s one more preliminary. It’s what I call the “N-substance problem.” Let’s say I’m a merchant, and I have a collection of N different substances which I can sell to you. You have a preference function over these substances, which you are going to try to maximize. I’m making a special assumption about this preference function: it’s a linear combination of the quantities of each substance that you have.

Let’s say V_{k} is the volume of substance *k* that you have, for *k*=1…*n*. Then your preference function U can be written as:

`U =`

*u*_{1}V_{1}+*u*_{2}V_{2}+ ... +*u*_{n}V_{n}We can call u_{k} your *utility density* for substance *k*.

Let’s say you have 1.00 credit to spend on these substances, but this credit can be divided into arbitrarily small quantities. The N-substance problem is this: how can I devise a pricing scheme such that, based on what you buy, I can determine (up to a positive Affine Transformation) what your utility function is (that is, the values *u*_{1}…*u*_{n})?

Not just any pricing scheme will work. For example, consider the obvious scheme in which you can buy one liter of any substance for 1 credit. If the substances are green tea, lemonade, and maple syrup, with utility densities 2, 3, and 20 respectively, your optimal strategy is to spend your entire credit on maple syrup, without buying any green tea or lemonade. But then I can’t find out the relative values you assign to green tea and lemonade, so this pricing scheme fails.

So now let’s specify our pricing scheme. We want a function *f* which determines the volume of substance you get for a given number of credits. And for any *u*_{1}, *u*_{2}, we want U = *u*_{1}*f*(*x*_{1}) + *u*_{2}*f*(*x*_{2}) to be maximized whennever *f*(*x*_{1}) / *f*(*x*_{2}) = *u*_{1} / *u*_{2}. Furthermore, we will arbitrarily set *f*(1) = 1.

So let’s say we are spending a total of C credits on *x*_{1} and *x*_{2}. Then we can let *x*_{2} = C−*x*_{1}. Now if we differentiate U with respect to *x*_{1} (conveniently assuming *f* is differentiable), we get:

`dU/d`

*x*_{1}=*u*_{1}*f'*(*x*_{1}) −*u*_{2}*f'*(C−*x*_{1})If U is at a maximum, then this derivative will be 0, so:

*u*_{1}*f'*(*x*_{1}) −*u*_{2}*f'*(C−*x*_{1}) = 0

*u*

_{1}

*f'*(

*x*

_{1}) =

*u*

_{2}

*f'*(C−

*x*

_{1})

*u*

_{1}/

*u*

_{2}=

*f*'(C−

*x*

_{1}) /

*f*'(

*x*

_{1})

whennever:

*f*(*x*_{1}) /*f*(C−*x*_{1}) =*u*_{1}/*u*_{2}Combining these two equations, we get:

*f*(*x*_{1}) /*f*(C−*x*_{1}) =*f*'(C−*x*_{1}) /*f*'(*x*_{1})We set *x*_{1} = 1. Recall that we arbitrarily set *f*(1) = 1, so:

`1 /`

*f*(C−1) =*f*'(C−1) /*f*'(1)Letting *x* = C−1 and letting *k* = *f ‘*(1):

`1 /`

*f*(*x*) =*f*'(*x*) /*k*

*f*'(

*x*) = k /

*f*(

*x*)

So *f*(*x*) = √*x*.

## Hay Voting

Now we have a pricing scheme which encourages you to purchase quantities of each substance which are proportional to your utility density for that substance – assuming it makes sense to talk about “utility density” (that is, assuming your preferences are linear). When does that assumption hold?

If you’re trying to maximize the expected value of some function, then your preferences are linear over the probabilities assigned to each outcome. So if I sell you probabilities using this pricing scheme, I can discover your utility function!

Under Hay Voting, you receive a ballot which has a √(N-1) units of “voting mass” (called *votes*) allocated to each candidate, where N is the number of candidates. You get 1.00 voting credit, which you may spend to reallocate some of this voting mass. If you spend *x* credits to trasnfer mass from candidate A to candidate B, then A loses √*x* votes and B gains √*x* votes. The √(N-1) value was selected so that you have just enough voting credit to completely eliminate one candidate from your ballot, or double the voting mass assigned to one candidate.

To elect a candidate, we gather all the ballots together and place them into a hat. One ballot is drawn at random, and then one point of voting mass is selected from that ballot, so that the probability of any candidate being selected is proportional to the amount of voting mass allocated to that candidate.

Now it makes sense to talk about the *utility density* of a transfer of voting mass from one candidate to another. This density is simply the difference between the utilities you assign to the two candidates. So if you buy these transfers in such a way as to maximize your expected utility, I can see relative sizes of the intervals between the utilities you assign to any two candidates – and that tells me everything there is to know about your utility function over candidates.

## Deterministic Hay Voting…?

In Deterministic Hay Voting, the election procedure is as above, but the candidate with the most voting mass is always elected. It may still be possible (at least with idealized agents) to determine the utility functions of the voters by looking at their votes.

Why do we vote? In real life one may vote for any number of reasons: to make a statement, to do one’s patriotic duty, etc. In designing Hay Voting I made the unrealistic assumption that one votes only in order to affect the outcome of the election. I will continue to use that assumption to talk about deterministic voting.

Thus, one’s utility function over *votes* is dependent upon two things: one’s utility function over *candidates*, and one’s probability distribution for the effect of one’s vote on the result of the election. The latter can be written in terms of a probability distribution for the actions of the other voters.

As long as it makes sense to talk about the utility *density* of votes (as long as the size of your own vote is small relative to your uncertainty about how others will vote), we can discover your utility function over votes by using the square-root pricing rule. Then if we knew your probability distribution for the actions of the other voters, we would be able to find the set of all possible utility functions over *candidates* that would suggest the voting strategy you used.

To discover your probability distribution for the actions of the other voters, we can play a prediction game using a strictly proper scoring rule (you can read about strictly proper scoring rules here). Your reward in this game is based on the probability you assigned to the correct outcome; a strictly proper scoring rule is one such that your optimal strategy is to report your actual beliefs. In this case we would probably prefer a squared-error scoring rule because it is bounded (unlike the logarithm). The reward could be additional votes in the next election.

Deterministic Hay Voting is still a fairly sketchy idea. I’m not sure exactly what conditions the voters’ probability distributions must satisfy in order for it to work.

Again, I feel pretty uncertain about how clear my explanations were here, so **please give me feedback** and **ask questions about anything unclear.**

Thanks for reading!