Django

Scaling Django Web Apps by Mike Malone

Todd Hoff

17 May 2009 — 3 min read

Film buffs will recognize Django as a classic 1966 spaghetti western that spawned hundreds of imitators. Web heads will certainly first think of Django as the classic Python based Web framework that has also spawned hundreds of imitators and has become the gold standard framework for the web.

Mike Malone, who worked on Pownce, a blogging tool now owned by Six Apart, tells in this very informative EuroDjangoCon presentation how Pownce scaled using Django in the real world.

I was surprised to learn how large Pounce was: hundreds of requests/sec, thousands of DB operations/sec, millions of user relationships, millions of notes, and terabytes of static data. Django has a lot of functionality in the box to help you scale, but if you want to scale large it turns out Django has some limitations and Mike tells you what these are and also provides some code to get around them.

Mike's talk-although Django specific--will really help anyone creating applications on the web. There's a lot of useful Django specific advice and a lot of general good design ideas as well.

The topics covered in the talk are:

Django uses a shared nothing architecture.
* The database is responsible for scaling state.
* Application servers are horizontally scalable because they are stateless.

Scalability vs Performance. Performance is not the same as scalability. Scalability is A scalable system doesn’t need to change when the size of the problem changes.

Type of scalability:
* Vertical - buy bigger hardware
* Horizontal - the ability to increase a system’s capacity by adding
more processing units (servers)

Cache to remove load from the database server.

Built-in Django Caching: Per-site caching, per-view cache, template fragment cache - not so effective on heavily personalized pages

Low-level Cache API is used to cache at any level of granularity.

Pounce cached individual objects and lists of object IDs.

The hard part of caching is invalidation. How do you know when a value changes such that the cache should be up updates so readers see valid values?
* Invalidate when a model is saved or deleted.
* Invalidate post_save, not pre_save.
* This leaves a small race condition so:
** Instead of deleting, set the cache key to None for a short period of time
** Instead of using set to cache objects, use add, which fails if there’s already something stored for the key

Pounce ran memcached on their web servers
* Their servers were not CPU bound, they were IO and memory bound so they compressed objects before caching.

Work is spread between multiple application servers using a load balancer.

Best way to reduce load on your app servers: don’t use them to do hard stuff.

Pounce used software load balancing
* Hardware load balancers are expensive ($35K) and you need two for redunancy.
* Software load balancers are cheap and easy.
* Some options: Perlbal, Pound, HAProxy, Varnish, Nginx
* Chose a single Perlbal server. This was a Single Point of Failure but they didn't have the money for hardware. Liked Perlbal's reproxying feature.

Used a ghetto queuing solution (MySQL + cron) to process work asynchronously in the background.

At scale their system needed to have high availability and be partitionable.
* The RDBMS’s consistency requirements get in our way
* Most sharding / federation schemes are kludges that trade consistency
* There are many non relational databases (CouchDB, Cassandra, Tokyo Cabinet) but they aren't easy to use with Django.

Rules for denormalization:
* Start with a normalized database
* Selectively denormalize things as they become bottlenecks
* Denormalized counts, copied fields, etc. can be updated in signal handlers

Joins are evil and Django makes it really easy to do joins.

Database Read Performance
* Since your typical web app is 80% to 80% reads adding MySQL master-slave replication can solve a lot of problems.
* Django doesn't support multiple database connections, but there's a library, linked to at the end of this document to help.
* A big problem is slave lag. When you write to the primary it takes time for the state to be transferred to the read slaves so readers may see an old value on the read.

Database Write Performance
* Federate. Split tables across different servers. Not well supported by Django.
* Vertical Partitioning: split tables that aren’t joined across database servers.
* Horizontal Partitioning: split a single table across databases (e.g., user table). Problem is autoincrement now doesn't work and Django uses autoincrement for primary keys.

Monitoring - You can't improve what you don't measure
* Products: Ganglia and Munin

Measure
* Server load, CPU usage, I/O
* Database QPS
* Memcache QPS, hit rate, evictions
* Queue lengths
* Anything else interesting

Interview with Leah Culver: The Making of Pownce

Django Caching Code

Django Multidb Code

EuroDjangoCon Presentations