U29uZ1BvcCBTY2FsZXMgdG8gMSBNaWxsaW9uIEFjdGl2ZSBVc2VycyBvbiBHQUUsIFNob3dpbmcg UGFhUyBpcyBub3QgUGFzc8Op

Should you use PaaS for your next project? Often the answer is no because you want control, but here's an example from SongPop showing why the promise of PaaS is not passé. SongPop was able to autoscale to 60 million users, 1 million daily active users, deliver 17 terabytes/day of songs and images worldwide, handle 10k+ queries/second, all with a 6 person engineering team, and only one engineer working full-time on the backend.

Unfortunately there aren't a lot of details, but what there is can be found in Scaling SongPop to 60 million users with App Engine and Google Cloud Storage. The outline follows the script. You start small. Let PaaS do the heavy lifting. And when you need to scale you just buy more resources and tune a little (maybe a lot). The payoff is you get to focus on feature development and can get by with a small team.

Here's a diagram of their architecture:

Some lessons learned:

  • Premier Support. This one sounds a bit like a sales pitch, but once they reached 100K daily active users they opened up a Premier Support account, which allowed them to talk to a real life person and solve downtime problems quickly.
  • Denormalize. To reduce ready latency they collected data spread across several models into one entity. An oldie, but still  a big win.
  • Cache. To reduce queries a user's opponent list was cached into memcache, which is a feature of GAE. This and the denormalization change took one engineer 4 days.
  • Deadlines. Once performance of an operation passes a threshold it's time to fallback to a different more predictable strategy.
  • Composite indexes. Queries were slow and the cause was traced to many indexed properties being used. The solution was to use a composite index or combine the data into a single entity. This problem was traced with the help with Premier Support, which also shows a weakness of PaaS, shouldn't programmers be able to find these kind of issues? Maybe with a slow query log?
  • Easy integration with other services. One advantage companies like Amazon and Google have is that they can create a powerful suite of cooperating services. Since SongPop needs to consume and distribute 17 terabytes/day of songs and images worldwide, they've found Google Cloud Storage affordable and very easy to use from GAE. And when you want to do some BigData, Google BigQuery is already built-in. A key design point.
  • Location headers. GAE requests automatically include headers which contain location information based on the IP address of the client request. SongPop uses this information to select opponents and build profiles.
  • Synchronous Simulated Gameplay. An innotive strategy Song Pop uses is Synchronous Simulated Gameplay. Scalable, consistent, low latency game play is hard, so why do it all? What SongPop does is record games and then plays them against you as you play. You appear to be playing against a real person, but there are none of the pesky engineering challenges. You only have to store sound snippets and game results, match players to games, and then reply the games. Quite clever.

Clearly this is your canonical Facebook style game, so it's not a complicated application, but it is a good existence proof to include in your architecture decision matrix.