# space and games

## February 8, 2012

### Large-Middle-Small: A Kanji Word Block

Filed under: General — Peter de Blanc @ 9:02 am

I’m investigating the Japanese language.

A Kanji word block is a 3 x 3 block of Kanji characters in which each row and each column forms a word, and no character appears more than once. Thus the word block contains six words in all. Using Jim Breen’s WWWJDIC as my reference, I found that there are 12 possible word blocks. Here is one:

 ãƒ€ã‚¤ / ãŠãŠ å¤§ Large ãƒãƒ¥ã‚¦ ä¸­ Middle ã‚·ãƒ§ã‚¦ / ã“ å° Small ã ã„ã¡ã‚…ã†ã—ã‚‡ã† L/M/S; Clothing Sizes ãƒ‹ãƒ³ äºº Person ã‚±ãƒ³ / ã‚«ãƒ³ é–“ Space ãŒãŸ åž‹ Model ã«ã‚“ã’ã‚“ãŒãŸ Humanoid ã‚¹ã‚¦ æ•° Number ãƒ å€¤ Value ã‚« åŒ– Change ã™ã†ã¡ã‹ Numeric Conversion ãŠãŠã«ã‚“ãšã† Large Numberof People ã¡ã‚…ã†ã‹ã‚“ã¡ Median ã“ãŒãŸã‹ Miniaturization

So a person-space-model is a humanoid, a number-value-change is a numeric conversion, and so on.

## February 3, 2012

### The Item and the Thing

Filed under: General — Peter de Blanc @ 2:56 am

We show Sam the item.
â€œI donâ€™t know,â€ says Sam.
We say, â€œIt is the item.â€
Again, we ask Sam, â€œwhat is this?â€
â€œThe item,â€ Sam repeats.
Sam has received his first lesson.

We show Sam the item.
â€œIâ€™m not sure,â€ says Sam, â€œbut I think it is the item.â€
â€œOf course it is the item! Why are you not sure?â€

The thing is very similar to the item.
We show Sam the thing.
â€œIt is the item!â€ Sam shouts confidently.
â€œIt is not the item! It is the thing!â€
Was Samâ€™s earlier hesitation justified?

We show Lisa the item.
â€œI donâ€™t know,â€ says Lisa.
We say, â€œIt is the item.â€
Again, we ask Lisa, â€œwhat is this?â€
â€œThe item,â€ Lisa repeats.
We show Lisa the thing.
â€œI donâ€™t know,â€ says Lisa.
We say, â€œIt is the thing.â€
Again, we ask Lisa, â€œwhat is this?â€
â€œThe thing,â€ Lisa repeats.

## December 20, 2011

Filed under: General — Peter de Blanc @ 11:53 pm

Google has some very cool n-gram data sets available for download. Alas, the files are quite large. For example, the 3-grams are split into 200 zip files, each weighing in at 440 MB. That’s about 88 GB total.

The 1-grams are much lighter, totaling 2 GB. I was able to reduce this to about 35 MB by throwing away the time information (the original files indicate the year each data point came from). That’s smaller than the original files by a factor of 57.

The 2-grams, 3-grams, and so on could also be similarly compressed by anyone who can get the files. I’d be happy to help if anyone wants to do this.

## November 19, 2011

### Kanji with Combinatorial Power

Filed under: General — Peter de Blanc @ 10:38 am

The Japanese writing system includes hiragana, katakana, and kanji. While hiragana and katakana represent sounds, kanji represent meanings. There are a lot of good places to start with learning Kanji. The JLPT N5 kanji list is a set of kanji that appear in the first of 5 standardized tests. Here is a list of the 100 most commonly-used Kanji on the internet.

Kanji may represent words by themselves, or they may be combined into larger units of meaning called compounds. I was interested in finding a set of kanji with a lot of combinatorial power — the ability to represent many distinct compounds without using any kanji not in the set.

For this project I used Jim Breen, jisho.org, Python, and a greedy heuristic. This solution is unlikely to be optimal. I’d like to share a truly optimal solution (with proof) later, but I don’t know how long that will take. Here’s what I’ve got now:

 1. æ—¥ counter for daysdayJapansun 2. ç›´ fixfranknesshonestyrepairstraightaway 3. ç”Ÿ birthgenuinelife 4. ä¸­ centerininsidemeanmiddle 5. äºº person 6. å¾Œ backbehindlater 7. å‰ beforein front 8. ä¸€ one 9. å¹´ counter for yearsyear 10. æœˆ monthmoon 11. åŠ halfmiddleodd numberpart-semi- 12. å¤œ eveningnight 13. é•· leaderlong 14. é–“ intervalspace 15. æ•° fatefigureslawnumberstrength 16. æ™‚ hourtime 17. åˆ† 1%chancesdegreedutyknowminute of timeone’s lotpartratesegmentshaku/100shareunderstand 18. ä½• what 19. ã€… symbol for kanji repetition 20. æœ¬ bookcounter for long cylindrical thingsmainoriginpresentrealtrue 21. å¾¡ governhonorablemanipulate 22. æ‰‹ hand 23. å¤§ biglarge 24. å®¶ expertfamilyhomehouseperformerprofessional 25. å±± mountain 26. å° littlesmall 27. å›½ country 28. ä¸‰ three 29. ä¸‹ belowdescenddowngiveinferiorlow 30. ä¸Š aboveup 31. èº« one’s station in lifepersonsomebody 32. è¶³ be sufficientcounter for pairs of footwearfootleg 33. ä»£ agechangeconvertcounter for decades of ages, eras, etc.feegenerationperiodratereplacesubstitute 34. å distinguishednamenotedreputation 35. ç‰© matterobjectthing 36. ã® used to indicate possession(actually a hiragana) 37. å­ 11PM-1AMchildfirst sign of Chinese zodiacsign of the rat 38. å¥³ femalewoman 39. å­¦ learningsciencestudy 40. éƒ¨ bureauclasscopycounter for copies of a newspaper or magazinedeptpartportionsection 41. æ–‡ artdecorationfiguresliteratureplansentencestyle 42. å¤© heavensimperialsky 43. ä½“ bodycounter for imagesobjectrealitysubstance 44. æ°— airatmospheremindmoodspirit 45. æ–¹ alternativedirectionperson 46. åœ° earthground 47. å¿ƒ heartmindspirit 48. ç›® careclassexperienceeyefavorinsightlook 49. äºŒ two 50. å ten 51. è‰² color 52. é“ coursedistrictjourneymoralroad-waystreetteachings 53. æ°´ water 54. ç¥ž godsmindsoul 55. ä¸» chieflordmain thingmasterprincipal 56. é¢ facefeaturesmasksurface 57. æ­£ 1040correctjusticerighteous 58. å­— characterlettersection of a villageword 59. é‡‘ gold 60. å’Œ harmonyJapanJapanese stylepeacesoften 61. å…ƒ beginningformer timeorigin

Here are the compounds (1309 items):

## November 11, 2011

### Locality in Go

Filed under: Go — Peter de Blanc @ 7:43 am

In June I said this on Twitter:

@luqui The board is a graph. A node for each empty point, a node for each group. Legality of a move depends only its radius 2 neighborhood.

@luqui The distance between two points may shrink by at most 2 when a move is played (because groups can be joined). Speed of light is 2.

@luqui Let R be a region of the board. The set of possible states of R after T moves is not affected by anything farther than cT = 2T.

@luqui Note: 2 points may move away from each other FTL (if a group dies), but not towards each other. Moving away is not communicating.

@luqui Information about legal possibilities can’t go FTL, but info about good moves can; strategy now depends on far future possibilities.

I’ll explain this again in a different way. Minimax search is an algorithm for perfect play in any combinatorial game. To solve Go by minimax search would, however, take a very long time, and so it is not a practical method by itself.

One technique that has been useful for Chess is to apply minimax search to a similar game with a new rule. The new rule is this: after n moves, for some particular n, the game will end and the final score will be determined using a board evaluation function. The optimal move in this modified game may then be assumed to be a decent move in the original chess position.

Since Go has a large number of moves available from most game positions, the values of n for which such calculations are practical are quite small. If we could throw out most possible moves (that is, if we could know that they are suboptimal without search), then we could search more deeply.

Enter locality – the fact that it takes a long time for effects on a go board to propagate over a long distance.

(The shading in this diagram shows the distance to the highlighted region, as defined by the metric discussed in the quoted material).

Suppose that our board evaluation function only cares about a small region R of the board, such as the highlighted region shown above. This means we’re now dealing with a local goal instead of the overall goal of winning the game. Let’s just press on, and at a later date we can figure out how local goals relate to winning the game.

If n=1, then the value within a region depends on the set of states that R can be in after one move. To determine which states are possible, we would need the following information:

• If a chain has only one liberty within R, we wish to know if it has an external liberty. This requires looking at points at distance 1 from R.
• If a chain has no liberties within R, then it must have an external liberty; but whether it can be captured depends on if it has two external liberties. This again requires looking at points at distance 1.
• If an empty point in R is bordered only by (WLOG) white stones, then the legality of a black play there may depend on whether it captures a white chain that is itself at distance 1 from R. To answer this question involves looking at points at distance 2 from R.

This is the extent of the information we may need about the rest of the board. So the set of reachable states in R depends only on a radius-2 neighborhood of R.

Now we can proceed by induction. Suppose that the depth-(n-1) value of R depends only on a neighborhood of radius 2(n-1). Then to get the depth n value, we would simply look at all the possible states that the radius 2(n-1) neighborhood can be placed into by one move, and take the max (or min) of their depth-(n-1) values. The set of possible states depends only on a radius 2 neighborhood of the radius 2(n-1) neighborhood – that is, it depends only on a radius 2n neighborhood of R. Thus the depth n value depends only on the depth 2n neighborhood.

Of course, this bound is not very tight; it would be possible to achieve much tighter bounds.

## November 10, 2011

### Papers on Go

Filed under: Go — Peter de Blanc @ 12:59 am

## October 10, 2011

### Directions for this Blog

Filed under: General — Peter de Blanc @ 8:36 am

It’s been nearly a year since my last post here. I’d like to get back into blogging, so I’ll start by listing various things I might blog. Here’s your chance to affect what I write about.

The list:

1. My new organization, It Builds Itself.
2. Books, articles, and papers I’m reading.
3. Programs, programming, and programming languages.
4. Go, programs that play Go, math about Go, and statistics about Go.
5. Details, possibly excessive, about my personal life.
6. The singularity and the future of humanity and of the universe (topics about which I know very little).
7. Mathematics that is already widely understood, but which may be new to some folks.
8. Imaginary places, people, and things.

Since this ended up being a pretty short list, I’ll be accepting further suggestions as well.

## October 19, 2010

### Space and Games is on Twitter

Filed under: General — Peter de Blanc @ 4:20 pm

I haven’t been posting on this blog much lately, but I have been tweeting. If you want to read more of my stuff, follow @spaceandgames.

## June 14, 2010

### Some material from my upcoming book

Filed under: Go — Peter de Blanc @ 9:15 pm

I’m writing a book about Go strategy. I’ll be comparing moves by professionals to moves by amateurs in the 1-3k range. My hope is that amateur players may improve by trying to imitate the professionals.

## April 6, 2010

### Breaking the Butterfly Effect: The “Dobbins Selector” Algorithm

Filed under: Useful Statistics — Peter de Blanc @ 3:13 pm

Challenge: Design an algorithm that:

• takes a discrete probability distribution and a source of random bits, and returns a sample from the probability distribution.
• if running the algorithm on distribution P1 with random bits S yields X1, and running it on (P2, S) yields X2, and X2 â‰  X1, then P2(X2) / P2(X1) > P1(X2) / P1(X1)

I ran into this problem in the context of a computer game I’m developing. Rather than storing every detail of the game world in the map file, I store a random seed that is used to generate most map features. I found that every time I tweaked certain parameters in my code, old map files became unrecognizable.

Thus, we would like a random sampling algorithm that is unlikely to change its output when we make a small change to the underlying probability distribution (assuming we keep our original random seed). In particular, if we increase the probability of one outcome and renormalize the others, then the only way the outcome should change is if it changes to the outcome whose probability we increased.

If you want to solve the problem yourself, this would be a good time to stop reading.

I shared this problem with Michael Dobbins and he solved it for me.

Mike’s method is to associate each face of a simplex with one of the outcomes. We pick a random point inside the simplex, and measure the distance from that point to each of the faces, then divide that distance by the probability of the associated outcome. The lowest number wins.

Next Page »