With 1.49 billion monthly active users, operating at Facebook scale is far from trivial. Facebook's new live video streaming services present a fascinating use case for designing streaming service in global distribution and massive scale.
How do you scale a system from one user to more than 11 million users? Joel Williams, Amazon Web Services Solutions Architect, gives an excellent talk on just that subject: AWS re:Invent 2015 Scaling Up to Your First 10 Million Users.
If you are an advanced AWS user this talk is not for you, but it’s a great way to get started if you are new to AWS, new to the cloud, or if you haven’t kept up with with constant stream of new features Amazon keeps pumping out.
As you might expect since this is a talk by Amazon that Amazon services are always front and center as the solution to any problem. Their platform play is impressive and instructive. It's obvious by how the pieces all fit together Amazon has done a great job of mapping out what users need and then making sure they have a product in that space.
Some of the interesting takeaways:
- Start with SQL and only move to NoSQL when necessary.
- A consistent theme is take components and separate them out. This allows those components to scale and fail independently. It applies to breaking up tiers and creating microservices.
- Only invest in tasks that differentiate you as a business, don't reinvent the wheel.
- Scalability and redundancy are not two separate concepts, you can often do both at the same time.
- There's no mention of costs. That would be a good addition to the talk as that is one of the major criticisms of AWS solutions.
This is so good! Perfect for your Monday morning jam.
I'm all green (hot patch)
Called a Penguin and Chameleon
I'm all green (hot patch)
Call Torvalds and Kroah-Hartman
It’s too hot (hot patch)
Yo, say my name you know who I am
It’s too hot (hot patch)
I ain't no simple code monkey
There’s a long history of donating spare compute cycles for worthy causes. Most of those efforts were started in the Desktop Age. Now, in the Cloud Age, how can we donate spare compute capacity? How about through a private spot market?
There are cycles to spare. Public Cloud Usage trends:
Instances are underutilized with average utilization rates between 8-9%
24% of instance reservations are unused
Maybe all that CapEx sunk into Reserved Instances can be put to some use? Maybe over provisioned instances could be added to the resource pool as well? That’s a lot of power Captain. How could it be put to good use?
There is a need to crunch data. For science. Here’s a great example as described in This is how you count all the trees on Earth. The idea is simple: from satellite pictures count the number of trees. It’s an embarrassingly parallel problem, perfect for the cloud. NASA had a problem. Their cloud is embarrassingly tiny. 400 hypervisors shared amongst many projects. Analysing all the data would would take 10 months. An unthinkable amount of time in this Real-time Age. So they used the spot market on AWS.
The upshot? The test run cost a measly $80, which means that NASA can process data collected for an entire UTM zone for just $250. The cost for all 11 UTM zones in sub-Sarahan Africa and the use of all four satellites comes in at just $11,000.
“We have turned what was a $200,000 job into a $10,000 job and we went from 100 days to 10 days [to complete],” said Hoot. “That is something scientists can build easily into their budget proposals.”
That last quote, That is something scientists can build easily into their budget proposals, stuck in my craw.
Imagine how much science could get done if you didn’t have the budget proposal process slowing down the future? Especially when we know there are so many free cycles available that are already attached to well supported data processing pipelines. How could those cycles be freed up to serve a higher purpose?
Netflix shows the way with their internal spot market. Netflix has so many cloud resources at their disposal, a pool of 12,000 unused reserved instances at peak times, that they created their own internal spot market to drive better utilization. The whole beautiful setup is described Creating Your Own EC2 Spot Market, Creating Your Own EC2 Spot Market -- Part 2, and in High Quality Video Encoding at Scale.
The win: By leveraging the internal spot market Netflix measured the equivalent of a 210% increase in encoding capacity.
Netflix has a long and glorious history of sharing and open sourcing their tools. It seems likely when they perfect their spot market infrastructure it could be made generally available.
Perhaps the Netflix spot market could be extended so unused resources across the Clouds could advertise themselves for automatic integration into a spot market usable by scientists to crunch data and solve important world problems.
Perhaps donated cycles could even be charitable contributions that could help offset the cost of the resource? My wife is a tax accountant and she says this is actually true, under the right circumstances.
This kind of idea has a long history with me. When AWS first started, I like a lot of people wondered, how can I make money off this gold rush? That’s before we knew Amazon was going to make most of the tools to sell to the miners themselves. The idea of exploiting underutilized resources fascinated me for some reason. That is, after all, what VMs do for physical hardware, exploit the underutilized resources of powerful machines. And it is in some ways the idea behind our modern economy. Yet even today software architectures aren’t such that we reach anything close to full utilization of our hardware resources. What I wanted to do was create a memcached system that allowed developers to sell their unused memory capacity (and later CPU, network, storage) to other developers as cheap dynamic pools of memcached storage. Get your cache dirt cheap and developers could make some money back on underused resources. A very similar idea to the spot market notion. But without homomorphic encryption the security issues were daunting, even assuming Amazon would allow it. With the advent of the Container Age sharing a VM is now way more secure and Amazon shouldn’t have a problem with the idea if it’s for science. I hope.
Sponsored Post: Netflix, StatusPage.io, Redis Labs, Jut.io, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7
- Manager - Site Reliability Engineering: Lead and grow the the front door SRE team in charge of keeping Netflix up and running. You are an expert of operational best practices and can work with stakeholders to positively move the needle on availability. Find details on the position here: https://jobs.netflix.com/jobs/398
- Senior Service Reliability Engineer (SRE): Drive improvements to help reduce both time-to-detect and time-to-resolve while concurrently improving availability through service team engagement. Ability to analyze and triage production issues on a web-scale system a plus. Find details on the position here: https://jobs.netflix.com/jobs/434
- Manager - Performance Engineering: Lead the world-class performance team in charge of both optimizing the Netflix cloud stack and developing the performance observability capabilities which 3rd party vendors fail to provide. Expert on both systems and web-scale application stack performance optimization. Find details on the position here https://jobs.netflix.com/jobs/860482
- Senior Devops Engineer - StatusPage.io is looking for a senior devops engineer to help us in making the internet more transparent around downtime. Your mission: help us create a fast, scalable infrastructure that can be deployed to quickly and reliably.
- UI Engineer– AppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.
- Software Engineer - Infrastructure & Big Data – AppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.
Fun and Informative Events
- Your event could be here. How cool is that?
Cool Products and Services
- Turn chaotic logs and metrics into actionable data. Scalyr is a tool your entire team will love. Get visibility into your production issues without juggling multiple tools and tabs. Loved and used by teams at Codecademy, ReturnPath, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.
- Real-time correlation across your logs, metrics and events. Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.
- Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.
- SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!
- InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net
- VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.
- MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/
- aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com
- ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
- www.site24x7.com : Monitor End User Experience from a global monitoring network.
If any of these items interest you there's a full description of each sponsor below. Please click to read more...
Chapter by chapter Sergey Ignatchenko is putting together a wonderful book on the Development and Deployment of Massively Multiplayer Games, though it has much broader applicability than games. Here's a recent chapter from his book.
Enter Front-End Servers
Thou art as sweet as the sum of the sum of Romeo and his horse and his black cat! Speak thy mind!
Our Classical Deployment Architecture (especially if you do use FSMs) is not bad, and it will work, but there is still quite a bit of room for improvement for most of the games out there. More specifically, we can add another row of servers in front of the Game Servers, as shown on Fig VI.8:
The main purpose of this work is to show results of benchmarking some of the leading in-memory NoSQL databases with a tool named YCSB.
We selected three popular in-memory database management systems: Redis (standalone and in-cloud named Azure Redis Cache), Tarantool and CouchBase and one cache system Memcached. Memcached is not a database management system and does not have persistence. But we decided to take it, because it is also widely used as a fast storage system. Our “firing field” was a group of four virtual machines in Microsoft Azure Cloud. Virtual machines are located close to each other, meaning they are in one datacenter. This is necessary to reduce the impact of network overhead in latency measurements. Images of these VMs can be downloaded by links: one, two, three and four (login: nosql, password: qwerty). A pair of VMs named nosql-1 and nosql-2 is useful for benchmarking Tarantool and CouchBase and another pair of VMs named nosql-3 and nosql-4 is good for Redis, Azure Redis Cache and Memcached. Databases and tests are installed and configured on these images.
Our virtual machines were the basic A3 instances with 4 cores, 7 GB RAM and 120 GB disk size.
Databases and their configurations
In a nutshell, Peecho is all about turning your digital content into professionally printed products. Although it might look like a simple task, a lot of stuff happens behind the scenes to make that possible. In this article, we’re going to tell you about our processing architecture as well as at a recent performance improvement with the integration of AWS Lambda functions.
In earlier versions, when the Peecho platform first launched, all processing was executed by EC2 instances. For a single order it was done sequentially; page by page as illustrated below.