“…[T]he Library of Congress is now stockpiling the entire Twitterverse, or Tweetosphere, or whatever we’ll end up calling it—anyway, the corpus of all public tweets. There are a lot. The library embarked on this project in April 2010, when Jack Dorsey’s microblogging service was four years old, and four years of tweeting had produced 21 billion messages. Since then Twitter has grown, as these things do, and 21 billion tweets represents not much more than a month’s worth. As of December , the library had received 170 billion—each one a 140-character capsule garbed in metadata with the who-when-where.
The library has attached itself to the firehose. A stream of information flows from 500 million registered twitterers (counting duplicates, dead people, parodies, imaginary friends, and bots) who thumb their hurried epistles into phones and tablets and PCs, and the tweets pour into Twitter’s servers at a rate of thousands per second—tens of thousands at peak times: World Cup matches, presidential elections, Beyonce’s pregnancy—and make their way in ‘real time’ to a company called Gnip, a social-media data provider in Boulder, Colorado. Gnip organizes them into one-hour batches on a secure server for download, where they are counted and checked and finally copied to reels of magnetic tape, to be stored in a couple of filing cabinets. In different locations, for safety. If you have ever tweeted, rest assured that each of your little gems is there for posterity.”
-James Gleick considers “Librarians of the Twitterverse” on NYRBlog