The Design of 99designs - A Clean Tens of Millions Pageviews Architecture
99designs is a crowdsourced design contest marketplace based out of Melbourne Australia. The idea is that if you have a design you need created you create a contest and designers compete to give you the best design within your budget.
If you are a medium sized commerce site this is a clean example architecture of a site that reliably supports a lot of users and a complex workflow on the cloud. Lars Yencken wrote a nicely written overview of the architecture behind 99designs in Infrastructure at 99designs. Here's a gloss on their architecture:
Stats
- Team has 8 devs, 2 dev ops, 2 ux/designers
- Hundreds of thousands of unique visitors a month
- Tens of millions pageviews a month
Stack
- Largely an Amazon based stack
- Elastic Load Balancer (ELB)
- Varnish
- PHP with Apache/mod_php
- S3
- Beanstalk for in-memory queing using Pheanstalk bindings
- Amazon's RDS (MySQL)
- Memcached
- MongoDB
- Redis
- Rightscale/Chef
- NewRelic, CloudWatch, Statsd
Infrastructure
- Layered architecture: load balancing, acceleration, application, asynchronous tasks, storage and transient data.
- ELB is reliable and handles load balancing and terminating SSL connections so that traffic is unencrypted below the ELB. A separate ELB is used for each domain.
- Varnish is used to serve file based long tail media.
- Varnish is fast, configurabe, has a DSL, and has useful command line tools for debugging live traffic.
- Dynamic and uncached requests are served from a PHP application.
- Designs are stored on S3.
- S3 latencies are poor so designs are cached locally after each request.
- Requests that may take a long time are asynchronously queued to an in-memory queue implemented using Beanstalk, which is lightweight and performs well.
- PHP workers read work off the queue and execute the required functionality.
- Scheduled jobs are queued using cron at the appropriate time.
- Amazon's RDS is used as the database and uses master-master replications across multiple availability zones for redundancy.
- Rolling RDS backups are used as disaster discovery.
- As load increases requests are load balanced across the read slaves.
- S3 stores media files and data files.
- Backups are made to Rackspace and Cloudfiles for disaster recovery.
- Memcached is run on every server and is used to cache queries.
- Capped collections in MongoDB are used to log errors and statistics.
- Redis stores per-user information about which features are enabled for a user.
- Per user configuration is used for dark launches, soft launches and incremental feature rollouts.
- Amazon allows them not to own any hardware and remain flexible.
- Emphasis on automation using "software as infrastructure" ethos.
- Rightscale manages servers configurations using Chef. Servers are disposable.
- Monitoring is implemented using NewRelic, CloudWatch, Statsd. Two large monitoring screens display a dashboard of site behavior.
Lessons Learned
- Test to scale down. Highly variable load means they make heavy use of the cloud's scaling down capability, which requires a lot of testing to make work.
- International customers need a CDN. They have a lot of international customers and since they serve out of US-east these customers don't always get a quality experience. They are looking at various CDNs to give better service to international customers.
- Maintaining stability while growing requires testing and automation. To support frequent releases they are implemented acceptance testing and more automation. The ability to turn features on and off on a per user basis allows testing new features against a subset of users.