The Design of 99designs - A Clean Tens of Millions Pageviews Architecture

99designs is a crowdsourced design contest marketplace based out of Melbourne Australia. The idea is that if you have a design you need created you create a contest and designers compete to give you the best design within your budget.

If you are a medium sized commerce site this is a clean example architecture of a site that reliably supports a lot of users and a complex workflow on the cloud. Lars Yencken wrote a nicely written overview of the architecture behind 99designs in Infrastructure at 99designs. Here's a gloss on their architecture:

Stats

  • Team has 8 devs, 2 dev ops, 2 ux/designers
  • Hundreds of thousands of unique visitors a month
  • Tens of millions pageviews a month

Stack

  • Largely an Amazon based stack
  • Elastic Load Balancer (ELB)
  • Varnish
  • PHP with Apache/mod_php
  • S3
  • Beanstalk for in-memory queing using Pheanstalk bindings
  • Amazon's RDS (MySQL)
  • Memcached
  • MongoDB
  • Redis
  • Rightscale/Chef
  • NewRelic, CloudWatch, Statsd

Infrastructure

  • Layered architecture: load balancing, acceleration, application, asynchronous tasks, storage and transient data.
  • ELB is reliable and handles load balancing and terminating SSL connections so that traffic is unencrypted below the ELB. A separate ELB is used for each domain.
  • Varnish is used to serve file based long tail media.
  • Varnish is fast, configurabe, has a DSL, and has useful command line tools for debugging live traffic.
  • Dynamic and uncached requests are served from a PHP application.
  • Designs are stored on S3.
  • S3 latencies are poor so designs are cached locally after each request.
  • Requests that may take a long time are asynchronously queued to an in-memory queue implemented using  Beanstalk, which is lightweight and performs well.
  • PHP workers read work off the queue and execute the required functionality.
  • Scheduled jobs are queued using cron at the appropriate time.
  • Amazon's RDS is used as the database and uses master-master replications across multiple availability zones for redundancy.
  • Rolling RDS backups are used as disaster discovery.
  • As load increases requests are load balanced across the read slaves.
  • S3 stores media files and data files.
  • Backups are made to Rackspace and Cloudfiles for disaster recovery.
  • Memcached is run on every server and is used to cache queries.
  • Capped collections in MongoDB are used to log errors and statistics.
  • Redis stores per-user information about which features are enabled for a user.
  • Per user configuration is used for dark launches, soft launches and incremental feature rollouts.
  • Amazon allows them not to own any hardware and remain flexible.
  • Emphasis on automation using "software as infrastructure" ethos.
  • Rightscale manages servers configurations using Chef. Servers are disposable.
  • Monitoring is implemented using NewRelic, CloudWatch, Statsd. Two large monitoring screens display a dashboard of site behavior.

Lessons Learned

  • Test to scale down. Highly variable load means they make heavy use of the cloud's scaling down capability, which requires a lot of testing to make work.
  • International customers need a CDN. They have a lot of international customers and since they serve out of US-east these customers don't always get a quality experience. They are looking at various CDNs to give better service to international customers.
  • Maintaining stability while growing requires testing and automation. To support frequent releases they are implemented acceptance testing and more automation. The ability to turn features on and off on a per user basis allows testing new features against a subset of users.