In Decision Theory, agents make decisions which maximize the expected value of a utility function. A utility function for a Chess or Go player might be: “1 if I win, -1 if I lose, 0 if I draw.”
Because we have to work with limited computational resources, we can’t search the entire game tree when we play games like Chess and Go. In this case, we would like a way to approximate the expected utility of a node in the game tree, using local properties of that node. I’m calling such an approximation a “board evaluator.” Board evaluators have been used with some success in Computer Chess.
A move suggester is something that looks at a situation and returns a list of moves, possibly with numerical values attached; higher values are preferred. We can sum the values returned by several different move suggesters, and select the move with the highest value. When a mosquito lands on your arm and you twitch, you may be using something like a move suggester.
More formally, we can define a move suggester as something which takes two adjacent nodes in the game tree and estimates the change in expected utility, compared to a board evaluator, which estimates the expected utility of a single node. So a board evaluator takes a node as its argument, while a move suggester takes an edge in the game tree.
Board evaluators have a major advantage over move suggesters: they allow us to use search algorithms such as minimax.
So it would be nice to have a way to build a board evaluator out of move suggesters. I know of three ways to do this:
- The naive method, as I’ll call it, considers the sequence of moves used to reach the position. We evaluate each move, and sum these values to estimate the value of the current position. Because we are adding up many terms, we will usually accumulate a significant error.
- Playout Analysis uses move suggesters to continue play from a given position until the game is finished. Then we determine who won the game. We can perform many randomized iterations of this analysis, and take some sort of average; this average is then used as the value of the original position.
- Tewari, developed by Honinbo Dosaku in the 17th century, works by permuting the sequence of moves used to reach a board position. The naive method is applied to each such permutation, and we average these values.
I’d be interested in examples of these methods (particularly Tewari) being used to analyze real-world decision problems.