Stuff The Internet Says On Scalability For October 28, 2011

You deserve a HighScalability today:

  • S3: 566 Billion Objects, 370K requests/sec; Titan: 38,400-processor, 20-petaflop
  • 1,000,000 daily users and no cache. Wooga flash game with 50K DB updates/second, Ruby backend. They hit an IO wall with MySQL at 1000 DB updates/sec. They needed more so they went with Redis. Not quite honest to say no cache was used as everything is RAM, but maybe that's the point. Use a lot of automation. Inactive users are archived. Moved away from EBS.
  • Making dynamic sitesscale like static sites by Wim Godden. Use Varnish, Nginx, and memcached.
  • The Lifecycle of a Web Page on StumbleUpon infographic. 2.2 mllion web pages are added to StumbleUpon each month. Nice discussion of bounce rate.
  • James Hamilton with an excellent overview of the Storage Infrastructure Behind Facebook Messages. That's 6B+ messages a day.
  • Scaling Twilio. Twilio has scaled traffic by more 100x over the past year, and expanded their server infrastructure from a few servers to 100′s running in the cloud. Core technologies:  PHP, Python, Twisted/gevent, Java, Asterisk/FreeSwitch/JSR289, MySQL, and Redis. Core principles: Simplicity, Automation, Shipping, Empiricism, and Humbleness.
  • Daniel Abadi with equally excellent Overview of the Oracle NoSQL Database. The sweet spot for the Oracle NoSQL database seems to be in single-rack deployments (e.g. the Oracle Big Data appliance) with a low-latency network, so that the system can be set up to use synchronous replication while keeping latency costs of this type of replication small (and the probability of network partitions are small).
  • Instagram shares 25 photos & 90 likes every second. To make sure their data fits in RAM the shard their data. They run on Django with PostgreSQL. Identifiers are created using PL/PGSQL, Postgres’ internal programming language, and Postgres’ existing auto-increment functionality.
  • Introduction to Parallel Programming and MapReduce on Google Code University.
  • Storage Mojo on RAMCloud is the new flash. The goal is enterprise-class availability with every bit of active data stored in DRAM, not disk or flash, for maximum performance. It is a key-value object store today, though as pure software that could change.
  • Sean Hull explains Why generalists are better at scaling the web. This ability to see big-picture can not be underestimated especially during times of crisis or pressure to meet targets. For a team to scale the web effectively, you're going to need a good mix of both types of personalities.
  • SSDAlloc - SSDAlloc is a hybrid SSD/RAM allocation tool that provides the best of both the worlds. It helps programmers build high-performance SSD applications (comparable to that of application rewrite) while requiring very few modifications to application code. It exposes a slab based memory allocation interface to the programmer and manages the data flow in and out of an SSD in a transparent manner using a backend runtime system.
  • Ben Stopford with a nice mix of links for October.