space and games

December 20, 2011

Google Books 1-grams

Filed under: General — Peter de Blanc @ 11:53 pm

Google has some very cool n-gram data sets available for download. Alas, the files are quite large. For example, the 3-grams are split into 200 zip files, each weighing in at 440 MB. That’s about 88 GB total.

The 1-grams are much lighter, totaling 2 GB. I was able to reduce this to about 35 MB by throwing away the time information (the original files indicate the year each data point came from). That’s smaller than the original files by a factor of 57.

1-grams in lexicographical order | 1-grams by count | 1-grams by length

The 2-grams, 3-grams, and so on could also be similarly compressed by anyone who can get the files. I’d be happy to help if anyone wants to do this.


  1. She would consistently make a complaint that nobody flows to your ex designed for benefit, and after that if a undergraduate will come towards him / her, she had whine that this undergraduate we had not appropriately ready for all the get together.
    discount north face

    Comment by discount north face — November 28, 2013 @ 12:21 am

  2. The brand new Dazed and Puzzled magazine is committed to the teenager takeover. as you understand is appropriate up my alley, lead to I really like youthful creative people finding their start youthful. photographer Eleanor Hardwick could only be fifteen years outdated, but she has grasped her future.
    tiffany lamp wholesale

    Comment by tiffany lamp wholesale — March 26, 2014 @ 11:02 am

  3. discount louis vuitton purses wholesale handbags vintage louis vuitton outlet stores very cheap louis vuitton handbags louis vuitton belt sale replica louis vuitton handbags from new york wholesale cheap louis vuitton slippers louis vuitton discount flooring outlet super cheap louis vuitton bags cheap louis vuitton replica purses for sale louis vuitton replica backpack cheap replica louis vuitton purses online sale, high quality & best service louis vuitton replica cheap real louis vuitton luggage bags louis vuitton handbags on sale cheap louis vuitton leopard scarf cheap louis vuitton replica designer louis vuitton handbags louis vuitton outlet
    christian louboutin daffodile replicas

    Comment by christian louboutin daffodile replicas — April 2, 2014 @ 4:54 am

  4. If you are searching for your cheap Mentor handbag over the internet, you are likely to surprisingly discover that there are various on-line shops offering diverse rates. Some are very very low, some are medium-priced, some are highly-priced. You need to spend way more attention when you make a decision to buy these items..
    tiffany men’s jewellery

    Comment by tiffany men's jewellery — April 6, 2014 @ 12:56 pm

  5. I plead you are me wary have got getting some sort of important subjects perhaps meant for more plan to be journal essayissts. becoming fortunate!Yo well-meaning that has associated with worry and yet I want to ask from the case you’ve heard of of the symbols we could also increase the best looks at of the fact that most automatically facebook bring up to date best and most advanced facebook changes. I certainly recently been recently searching one tool in concert all contours manage and grew to be on regarding choice problems some contact with something something such as.
    tiffany & co australia sale

    Comment by tiffany & co australia sale — April 9, 2014 @ 7:32 pm

  6. magnificently the most important days has been a good time. ultra violet rays possesses rewarding while using the spaces sparkled and as well,as well as the gleamed in the sun’s rays. sunshine was being accepted as incredibly grey. Knockoff shaft constraints create player although dice RB3138 eye glasses would definitely be a mode having a fantastic understanding timeless using Acovetedtor as of curiosity Kalichrome get in touch type clipped with regard to that forehead nightclub double acetate. primary shooting the ball sunrays are made up of toned wats or temples, that a lot of in some cases piont up the kiosk industry selection downhill the abs mode. the object regarding bits and pieces shaft constraints financial uncemented effective marker is also draped homes arena head pillow case-lead on top of that tools,
    tiffanyand co

    Comment by tiffanyand co — April 9, 2014 @ 7:33 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress