The Dollar Shave Club Architecture Unilever Bought for $1 Billion

This is a guest post by Jason Bosco, the Dollar Shave Club’s Director of Engineering, Core Platform & Infrastructure, on the infrastructure of its ecommerce technology.
With more than 3 million members, Dollar Shave Club will do over $200 million in revenue this year. Although most are familiar with the company’s marketing, this immense growth in just a few years since launch is largely due to its team of 45 engineers.
Dollar Shave Club engineering by the numbers:
Core Stats
Super Bowl Ads served with no downtime: 1
Monthly Traffic Bandwidth: 9 TB
Orders processed via Arm: 38 Million orders
Total Bugs Found: 4,566
Automation Tests Ran: 312,000
Emails sent via Voice: 195 Million emails
Analytics data points processed and stored in Hippocampus: 534 Million
Size of dataset in Hippocampus: 1.5TB
Currently Deployed Apps / Services: 22
Number of servers: 325
Technology Stack
Ember for a front-end framework
Primarily Ruby on Rails on the backend
Node.js for high-throughput background processing needs (eg: in Voice)
Golang for infrastructure software
Python for infrastructure & data science
Elixir for 1 internal app
Ruby for Test Automation
Swift and Objective C for Native iOS App
Fully Hosted on AWS
Ubuntu & CoreOS
Ansible & Terraform for Configuration Management
Transitioning to Docker-based deployments
Jenkins for deployment coordination
Nginx & Varnish
Fastly for application delivery
Sumologic for log aggregation
CloudPassage for security monitoring
Vault by HashiCorp for secrets storage & provisioning
Data Stores
Primarily MySQL hosted on RDS
Memcached hosted on Elasticache for caching
Self-hosted Redis servers primarily for queuing
A dash of Kinesis for handling orders from spiky traffic
Amazon Redshift for a data warehouse
Messaging & Queuing
Resque and Sidekiq for async job processing & messaging
RabbitMQ for messaging
Kafka for stream processing
Analytics & Business Intelligence
Snowplow & Adobe Analytics for web/mobile analytics
AWS Elastic MapReduce
FlyData to ETL data from MySQL into Redshift
Databricks (Hosted Spark)
Looker as the BI front-end
Near-realtime data availability for reporting
Rollbar, Sentry & Crashlytics for exception tracking
DataDog for custom application metrics & metrics aggregation
SysDig for infrastructure metrics & monitoring
NewRelic for application performance monitoring
Site24x7 for availability monitoring
PagerDuty for on-call alerting
QA and Test Automation
CircleCI for running unit tests
Jenkins + TestUnit + Selenium + SauceLabs for browser-based Automated tests
Jenkins + TestUnit + Selenium + SauceLabs for Brain Automated tests
Jenkins + TestUnit for API Functional Tests
Jenkins + TestUnit + Appium + SauceLabs for Native Android Automated Tests
Jenkins + TestUnit + Appium + SauceLabs for Native iOS Automated Tests
Jenkins + TestUnit + Selenium + SauceLabs + Proxy Server for BI Test Automation
SOASTA + Regex Scripts for Stress, Soak, Load and Performance Testing.
Engineering Workflow
Slack for cross-team communication
Trello for task tracking
Hubot with custom plugins as our chat bot
Github as our code repository
ReviewNinja integrated with Github Status API for code reviews
Continuous deployment - multiple deployments per day typically
Moving to continuous delivery
On-the-fly sandbox environments for feature development
Currently, single-button push deployment using Jenkins, moving towards continuous delivery
Vagrant box running docker containers => fully-functioning development environments for new engineers on their first day
Event-driven architecture
Moving from a monolithic architecture to “medium” services interacting through a common message bus
VCL-based edge-routing on the CDN edges, deployed just like any other app.
Web and Mobile frontends talk to an API layer
API layer talks to services, aggregates data and formats it for clients
Services talk to the data stores and message bus
Scheduled tasks run as one master job that breaks itself up into smaller jobs in resque/sidekiq
Technology components include internal tools for customer service (Brain), marketing automation platform (Voice), fulfillment system (Arm), subscription billing system (Baby Boy) and our data infrastructure (Hippocampus).
45 top-notch entrepreneurial and highly-skilled engineers working out of Marina Del Rey, CA HQ
Engineers participate in cross-functional teams called squads along with product managers, designers, UX and stakeholders to deliver end-to-end features.
Teams are vertically divided based on domains into Frontend, Backend, QA & IT.
Front-end team owns Web UI for & internal tools and our iOS & Android apps.
Backend team owns web backends for & internal tools, Internal Services (Billing and Fulfillment), Data Platform & Infrastructure.
QA teams owns testing and automation infrastructure for all digital products.
IT team owns Office & Warehouse IT.
Engineers get to attend one company-sponsored conference every year.
Engineers get to buy as many books / learning resources as they need.
Standing desks for all. One treadmill desk currently available as a pilot.
Weekly engineering team lunches.
Tech Belly events every other week where engineers present talks on technology topics over lunch.
Engineers are encouraged to experiment with bleeding edge technology and create proposals through Requests for Proposal (RFCs).
Engineers are encouraged to open source tools and libraries where it makes sense
Every engineer gets a standard issue of a 15” Mac Book Pro, 27” Mac Display and a 24” monitor.
One 3D-printer available to print props and more 3D printers.
Lessons Learned
Scaling becomes an easier challenge when components you’re trying to scale are composed of simple and small services.
Documentation & knowledge sharing are important for fast-growing teams.
A well-nurtured test-suite is critical to fast-evolving systems.
Redis uses an approximate LRU algorithm, so it’s not suitable if you have precise LRU requirements for caching
Web performance is critical, especially on mobile - every millisecond costs us revenue
Usability & User Experience are important even for internal tools: efficient tools = more productive teams