Ever wondered how news breaks on Twitter?
When an event happens, and people instantly come to Twitter to search for the event, how do Twitter systems learn what the queries mean without any context, and collect them as a trend, before the search spike is gone?
Twitter’s official Engineering blog just released an in-depth look at how Twitter search works.
The results are pretty impressive – and enlightening.
Twitter has built a real-time human computation engine to help identify search queries as soon as they’re trending, send the queries to real humans to be judged, and then incorporate the human annotations into Twitter’s back-end models.
Here’s a step-by-step overview summarizing how Twitter search really works:
1. First, Twitter constantly monitors which search queries are currently popular. They run a Storm topology – Storm is a system used to process data – that tracks statistics on search queries. This is the stage at which, for example, “Big Bird” or “#bindersfullofwomen” becomes recognized as trending.
2. As soon as a new popular search query is discovered, it is sent to human evaluators (via Amazon’s Mechanical Turk crowdsourcing Internet marketplace), who are asked a variety of questions about the query. For example, as soon as a spike in “Big Bird” searches arises, Twitter may ask judges to provide info like whether there are likely to be interesting pictures of the query, or whether the query is about a person or an event.
3. After a response from an evaluator is received, Twitter pushes the info to its backend systems, so that the next time a user searches for a query, machine learning models will make use of the additional information.
For example, suppose the evaluators tell Twitter that “Big Bird” is related to politics. That means that the next time someone performs this search, Twitter knows to surface ads by @barackobama or @mittromney, not ads about, say, Cookie Monster. Alternatively, “Stanford” may typically be an education-related query, but perhaps there’s a football game between Stanford and Berkeley at the moment. That would mean that, during that time, relevant content would need to be sports-related.
It’s interesting to learn how crucial the human element is to Twitter functionality. It really does take a human brain to determine the difference between Harry of Harry Potter, and Harry of the royal prince variety.
Read the full article here. (Warning: it’s not exactly light reading.)
(Engineer image from Shutterstock)