Some interesting updates from Twitter yesterday in direct response to some downtime we saw on Monday and, let’s face it, a good couple of months of very poor network performance on Twitter.com and their API.
On the main blog, Twitter “PR guy” Matt Graves (@mgrooves) writes about reliability. Specifically, how Twitter seems to be seriously lacking it of late, but they’re trying really hard to improve.
When you can’t update your profile photo, send a Tweet, or even sign on to Twitter, it’s frustrating. We know that, and we’ve had too many of these issues recently.
As we said last month, we are working on long-term solutions to make Twitter a more reliable and stable platform. It’s our number one priority. The bulk of our engineering efforts are currently focused on this issue, and we have moved resources from other projects to focus on it.
In two posts over on the Twitter engineering blog, Twitter engineer Jean-Paul Cozzatti (@jeanpaul) writes about Twitter’s plans to move their technical infrastructure to a new, custom-built data center in the Salt Lake City area.
Twitter’s user base has continued to grow steadily in 2010, with over 300,000 people a day signing up for new accounts on an average day. Keeping pace with these users and their Twitter activity presents some unique and complex engineering challenges (as John Adams, our lead engineer for application services, noted in a speech last month at the O’Reilly Velocity conference). Having dedicated data centers will give us more capacity to accommodate this growth in users and activity on Twitter.
Cozzatti also compares Twitter’s growth to ‘riding a rocket’, adding:
As we said last month, keeping pace with record growth in Twitter’s user base and activity presents some unique and complex engineering challenges. We frequently compare the tasks of scaling, maintaining, and tweaking Twitter to building a rocket in mid-flight.
During the World Cup, Twitter set records for usage. While the event was happening, our operations and infrastructure engineers worked to improve the performance and stability of the service. We have made more than 50 optimizations and improvements to the platform, including:
- Doubling the capacity of our internal network;
- Improving the monitoring of our internal network;
- Rebalancing the traffic on our internal network to redistribute the load;
- Doubling the throughput to the database that stores tweets;
- Making a number of improvements to the way we use memcache, improving the speed of Twitter while reducing internal network traffic; and,
- Improving page caching of the front and profile pages, reducing page load time by 80 percent for some of our most popular pages.
Cozzatti also updates us on Twitter’s current user count – 125 million. That’s up over 20 million since April of this year, which is a pretty amazing jump.
And it’s one that is clearly bringing a ton of issues. I’m hopeful that this move to a richer infrastructure this Autumn will almost certainly improve performance – you know, once we’re past the 1-2 months of new problems that this transition will inevitably bring – but as Cozzatti himself notes, Twitter is a “relatively small crew maintaining a comparatively large (rocket) ship.”
Making these improvements to Twitter’s technology is an essential step, but to properly scale the organisation clearly needs more of everything – money, of course, but especially people. And they need them now. And it’s not just engineering – it’s everywhere across the company.
All the equipment in the world won’t make a lick of difference if there aren’t enough people around to fix it all the next time something goes wrong. In fact, it’ll just compound the problem. And if you think performance is mediocre now we’ve moved above 100 million users, then just imagine what it will be like when we hit a billion.