Evernote Architecture - 9 Million Users and 150 Million Requests a Day

The folks at Evernote were kind enough to write up an overview of their architecture in a post titled Architectural Digest. Dave Engberg describes their approach to networking, sharding, user storage, search, and some other custom services.

Evernote is a cool application, partially realizing Vannevar Bush's amazing vision of a memex. Wikipedia describes Evernote's features succinctly:

Evernote is a suite of software and services designed for notetaking and archiving. A "note" can be a piece of formattable text, a full webpage or webpage excerpt, a photograph, a voice memo, or a handwritten "ink" note. Notes can also have file attachments. Notes can then be sorted into folders, tagged, annotated, edited, given comments, and searched. Evernote supports a number of operating system platforms (including Android, Mac OS X, iOS, Microsoft Windows and WebOS), and also offers online synchronization and backup services.

Key here is that Evernote stores a lot of data, that must be searched, and synced through their cloud to any device you use.

Another key is the effect of Evernote's business model and cost structure. Evernote is notable for their pioneering of the freemium model, based on the idea from their CEO: The easiest way to get 1 million people paying is to get 1 billion people using. Evernote is designed to become profitable at a 1% conversion rate. The free online service limits users to a hefty 60 MB/month while premium users pay $45 per year for 1,000 MB/month. To be profitable they most store a lot of data without spending a lot of money. There's not a lot of room for extras, which accounts for the simple practicality of their architecture.

The article is short and succinct, so definitely read it for details. Some takeaways:

  • Controlling costs. Evernote runs out of a pair of dedicated cages in a data center in Santa Clara, California. Using a cloud wouldn't provide enough processing power and storage at a cheap enough cost to make Evernote's business model work. As their load doesn't appear to be spiky, using their own colo site makes a lot of sense, especially given how they make use of VMs for reliability.
  • Architecture based on the nature of the data. User notes are independent of each other, which makes it very practical for Evernote to shard their 9.5 million total users across 90 shards. Each shard is a pair of two quad-core Intel  SuperMicro boxes with lots RAM and a full chassis of Seagate enterprise drives in mirrored RAID configurations. All storage and API processing is handled by a shard. They've found using directly attached storage to have the best price/performance ratio. Using a remote storage tier, with the same level of redundancy, would cost substantially more. Adding drives to a server and replicating with DRDB is low both in overhead and costs.
  • Application redundancy. Each box runs two VMs. A primary VM runs the core stack: Debian + Java 6 + Tomcat + Hibernate + Ehcache +  Stripes + GWT + MySQL (for metadata) + hierarchical local file systems (for file data). DRDB is used to replication a primary VM to a secondary VM on another box. Heartbeat is used to fail over to a secondary VM is the primary VM dies. A smart way to use those powerful machines and make a reliable system with fewer resources.
  • Data reliability. User data is stored on four different enterprise drives across two different physical servers. Nightly backups  copies data over a dedicated 1Gbps link to a secondary data center.
  • Fast request routing. User account information--username, MD5 password, and user shard ID--is stored in an in-memory MySQL database. Reliability comes from RAID mirroring, DRBD replication to a secondary, and nightly backups. This approach makes the routing of users to their data a simple and fast in-memory lookup, while still being highly available.
  • A separate pool of 28 8-core servers process images for search, handwriting recognition, and other services. This is custom software and is a powerful value-add that is not easily replicated by anyone else.
  • Puppet is used for configuration management.
  • Monitoring is done with Zabbix, Opsview, and AlertSite.

There's a promise of future articles focusing more on individual subsystems. I look forward to these as you have to appreciate the elegance of the system they've created for their business model. A good example to learn from.