Stuff The Internet Says On Scalability For April 17th, 2015

Hey, it's HighScalability time:

A fine tribute on Silicon Valley & hilarious formula evaluating Peter Gregory's positive impact on humanity.

  • 118/196: nations becoming democracies since mid19th century; $70K: nice minimum wage; 70 million: monthly StackExchange visitors; 1 billion: drone planted trees; 1,000 Years: longest-exposure camera shot ever

  • Quotable Quotes:

    • @DrQz: #Performance modeling is really about spreading the guilt around.

    • @natpryce: “What do we want?” “More levels of indirection!” “When do we want it?” “Ask my IDateTimeFactoryImplBeanSingletonProxy!”

    • @BenedictEvans: In the late 90s we were euphoric about what was possible, but half what we had sucked. Now everything's amazing, but we worry about bubbles

    • Calvin Zito on Twitter: "DreamWorks Animation: One movie, 250 TB to make.10 movies in production at one time, 500 million files per movie. Wow."

    • Twitter: Some of our biggest MySQL clusters are over a thousand servers.

    • @SaraJChipps: It's 2015: open source your shit. No one wants to steal your stupid CRUD app. We just want to learn what works and what doesn't.

    • Calvin French-Owen: And as always: peace, love, ops, analytics.

    • @Wikipedia: Cut page load by 100ms and you save Wikipedia readers 617 years of wait annually. Apply as Web Performance Engineer

    • @IBMWatson: A person can generate more than 1 million gigabytes of health-related data.

    • @allspaw: "We’ve learned that automation does not eliminate errors." (yes!)  

    • @Obdurodon: Immutable data structures solve everything, in any environment where things like memory allocators and cache misses cost nothing.

    • KaiserPro: Pixar is still battling with lots of legacy cruft. They went through a phase of hiring the best and brightest directly from MIT and the like.

    • @Obdurodon: Immutable data structures solve everything, in any environment where things like memory allocators and cache misses cost nothing.

    • @abt_programming: "Duplication is far cheaper than the wrong abstraction" - @sandimetz

    • @kellabyte: When I see places running 1,200 containers for fairly small systems I want to scream "WHY?!"

    • chetanahuja: One of the engineers tried running our server stack on a raspberry for a laugh.. I was gobsmacked to hear that the whole thing just worked (it's a custom networking protocol stack running in userspace) if just a bit slower than usual.

  • Chances are if something can be done with your data, it will be done. @RMac18: Snapchat is using geofilters specific to Uber's headquarter to poach engineers.

  • Why (most) High Level Languages are Slow. Exactly this by masterbuzzsaw: If manual memory management is cancer, what is manual file management, manual database connectivity, manual texture management, etc.? C# may have “saved” the world from the “horrors” of memory management, but it introduced null reference landmines and took away our beautiful deterministic C++ destructors.

  • Why NFS instead of S3/EBS? nuclearqtip with a great answer: Stateful; Mountable AND shareable; Actual directories; On-the-wire operations (I don't have to download the entire file to start reading it, and I don't have to do anything special on the client side to support this; Shared unix permission model; Tolerant of network failures Locking!; Better caching ; Big files without the hassle.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Paper: Large-scale cluster management at Google with Borg

Joe Beda (@jbeda): Borg paper is finally out. Lots of reasoning for why we made various decisions in #kubernetes. Very exciting.

The hints and allusions are over. We now have everything about Google's long rumored Borg project in one iconic Google style paper: Large-scale cluster management at Google with Borg.

When Google blew our minds by audaciously treating the Datacenter as a Computer it did not go unnoticed that by analogy there must be an operating system for that datacenter/computer.

Now we have the story behind a critical part of that OS:

Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.

It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.

We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.

Virtually all of Google’s cluster workloads have switched to use Borg over the past decade. We continue to evolve it, and have applied the lessons we learned from it to Kubernetes

The next version of Borg was called Omega and Omega is being rolled up into Kubernetes (steersman, helmsman, sailing master), which has been open sourced as part of Google's Cloud initiative.

Note how the world has changed. A decade ago when Google published their industry changing Big Table and Map Reduce papers they launched a thousand open source projects in response. Now we are not only seeing Google open source their software instead of others simply copying the ideas, the software has been released well in advance of the paper describing the software.

The future is still in balance. There's a huge fight going on for the future of what software will look like, how it is built, how it is distributed, and who makes the money. In the search business keeping software closed was a competitive advantage. In the age of AWS the only way to capture hearts and minds is by opening up your software. Interesting times.

Related Articles


Full Stack Tuning for a 100x Load Increase and 40x Better Response Times

A world that wants full stack developers also needs full stack tuners. That tuning process, or at least the outline of a full stack tuning process is something Ronald Bradford describes in not quite enough detail in Improving performance – A full stack problem.

The general philosophy is:

  • Understanding were to invest your energy first, know what the return on investment can be.
  • Measure and verify every change.

He lists several tips for general website improvements:

Click to read more ...


Sponsored Post: OpenDNS, MongoDB, Internap, Aerospike, Nervana, SignalFx, InMemory.Net, Couchbase, VividCortex, Transversal, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • The Cloud Platform team at OpenDNS is building a PaaS for our engineering teams to build and deliver their applications. This is a well rounded team covering software, systems, and network engineering and expect your code to cut across all layers, from the network to the application. Learn More

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • Nervana Systems is hiring several engineers for cloud positions. Nervana is a startup based in Mountain View and San Diego working on building a highly scalable deep learning platform on CPUs, GPUs and custom hardware. Deep Learning is an AI/ML technique breaking all the records by a wide-margin in state of the art benchmarks across domains such as image & video analysis, speech recognition and natural language processing. Please apply here and mention “” in your message.

  • Linux Web Server Systems EngineerTransversal. We are seeking an experienced and motivated Linux System Engineer to join our Engineering team. This new role is to design, test, install, and provide ongoing daily support of our information technology systems infrastructure. As an experienced Engineer you will have comprehensive capabilities for understanding hardware/software configurations that comprise system, security, and library management, backup/recovery, operating computer systems in different operating environments, sizing, performance tuning, hardware/software troubleshooting and resource allocation. Apply here.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • MongoDB World brings together over 2,000 developers, sysadmins, and DBAs in New York City on June 1-2 to get inspired, share ideas and get the latest insights on using MongoDB. Organizations like Salesforce, Bosch, the Knot, Chico’s, and more are taking advantage of MongoDB for a variety of ground-breaking use cases. Find out more at but hurry! Super Early Bird pricing ends on April 3.

Cool Products and Services

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • Looking for a scalable NoSQL database alternative? Aerospike is validating the future of ACID compliant NoSQL with our open source Key-Value Store database for real-time transactions. Download our free Community Edition or check out the Trade-In program to get started. Learn more.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • Benchmark: MongoDB 3.0 (w/ WiredTiger) vs. Couchbase 3.0.2. Even after the competition's latest update, are they more tired than wired? Get the Report.

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...


Three Fast Data Application Patterns

This is guest post by John Piekos, VP Engineering at VoltDB. I understand this is a little PRish, but I think the ideas are solid.

The focus of many developers and architects in the past few years has been on Big Data, specifically mining historical intelligence from the Data Lake (usually a Hadoop stack containing terabytes to petabytes of data).

Now, product architects are asking how they can use this business intelligence for competitive advantage. As a result, application developers have come to see the value of using and acting in real-time on streams of fast data; using OLAP reporting wisdom, they can realize the benefits of both fast data and Big Data. As a result, a new set of application patterns have emerged. The applications are designed to capture value from fast-moving streaming data, before it reaches Hadoop.

At VoltDB we call this new breed of applications “fast data” applications. The goal of these fast data applications is to do more than just push data into Hadoop asap, but also to capture real-time value from the data the moment the data arrives.  

Because traditional databases historically haven’t been fast enough, developers have been forced to go to great effort to build fast data applications - they build complex multi-tier systems often involving a handful of tools typically utilizing a dozen or more servers.  However, a new class of database technology, especially NewSQL offerings, has changed this equation.

If you have a relational database that is fast enough, highly available, and able to scale horizontally, the ability to build fast data applications becomes less esoteric and much more manageable. Three new real-time application patterns have emerged as the necessary dataflows to implement real-time applications. These patterns, enabled by new, fast database technology, are:

Click to read more ...


Stuff The Internet Says On Scalability For April 10th, 2015

Hey, it's HighScalability time:

Beautiful, isn't it? It's the cerebral cortex of a rat that is organized like a mini-Internet.
  • $47 million: value of Cannabis per square km; $3.7 trillion: worldwide IT spending in 2014;  $41B: spend on spectrum; 48,000 square km: How Much Land Would it Take to Power the US via Solar; 2,000: Hadoop clusters in the world; 650 pounds: projected size of ET
  • Quotable Quotes:
    • John Hugg: The number one rule of 21st century data management: If a problem can be solved with an instance of MySQL, it’s going to be.
    • @sarahnovotny: "there is no compression algorithm for experience" - great quote from Andy Jassy at #AWSSummit
    • Steve Martin: I did stand-up comedy for eighteen years. Ten of those years were spent learning, four years were spent refining, and four were spent in wild success.
    • Yossi Vardi: Revenues kill the dream.
    • @AWSSummits: AdRoll's retargeting and real-time betting operates at 6 billion impressions/day at 100ms latency on #AWS #AWSSummit 
    • @AWS_Partners: Nike is operating 70+ services as production loads in #aws today #AWSSummit 
    • @bernardgolden: S3 usage up 102% YOY, ec2 93%: #AWSSummit
    • @bernardgolden: AWS growing over 40% yoy. Next earnings announcement s/b v interesting. #awssummit 
    • @AlexBalk: Here is my Apple Watch review: Your life is largely meaningless. No gadget can obscure its emptiness. You are dying every day.
    • Jonas: Google: all apps become search. Facebook: all apps become feeds. 
    • @jon_moore: most scalable/fast/reliable systems follow these principles: elastic; responsive; resilient; message-driven. #phillyete
    • mrmondo: NVMe [Non-Volatile Memory Express] is one of the most important changes to storage over the past decade.
    • Peter Thiel: Often the smarter people are more prone to trendy, fashionable thinking because they can pick up on things, they can pick up on cues more easily, and so they’re even more trapped by it than people of average ability
    • @nickstenning: The women and men who wrote the nearly bug-free code that controlled a $4Bn space shuttle and the lives of astronauts worked 8am to 5pm.

  • Have you been let down by miracle materials like carbon nanotubes, buckyballs, and graphene? MOFs  (metal–organic frameworks) are here and they are real. This Nature podcast and article tells you all about them (about 13 minutes in). MOFs are scaffolds made of metal containing nodes linked by carbon-based struts. They are pieces that you can plug together and build up into big networks which have spaces in-between. It's those spaces that make MOFs useful. You can trap things in those holes and do things to the molecules when they are trapped. You can store gasses like methane and hydrogen. You can separate mixture of things by varying the pore sizes. Carbon capture is one big use. They also can be used as chemical sensors, maybe in some future version of your watch. Also perhaps write-once-read-many times memory.

  • Is Amazon recreating the Sun ecosystem in the cloud? We now have the Amazon Elastic File System so everything is remote mounted. WorkSpaces feels like diskless workstations. Storage is over on some NAS. The database is somewhere on the network. And so on. Let's hope NFS lock contention failures and network UI jitter don't also make a comeback. OK, I don't remember having anything like Amazon Machine Learning

  • Etsy is giving Facebook's HipHop Virtual Machine (HHVM) for PHP a try. Why? Their API and web code was diverging under parallel development pressures. And they were developing many small API endpoints that used many small requests instead of larger requests that do more work per request. And instead of sharing state in an inherently shared nothing architecture they went with the strategy of just making things faster. This is where HHMV comes in.

  • OK, that's impressive. Migrating from Heroku to AWS (using Docker). It took two engineers about one month. Performance increased 2x and average API response time dropped from around 220ms to under 100ms, and our background task execution times dropped in half as well. Half the number of servers were needed.

  • I was excited to see AWS is opening up Lambda. It's close to some ideas I've been talking about for a while (Building Super Scalable Systems, What Google App Engine Price Changes Say About The Future Of Web Architecture). When it first came out I rehabed my atrophied node.js skills and gave it a shot. Played around a bit, got some code working, but the problem was Lambda only exposed a few integration points and none of those were anything I cared about. Now, they've made Lambda much more general and in the process much more useful. Worth another look. I also suspect their NFS product was necessary to generalize Lambda. Code could be instantly available on every machine via a mount point. Just like back in the day.

  • How Early Adopters Are Using Unikernels - With and Without Containers: The creator of MirageOS, Anil Madhavapeddy’s group is working on a new tool stack called Jitsu (Just-in-Time Summoning of Unikernels), which can start a unikernel in ~20ms in response to a network request. < Also, Towards Heroku for Unikernels: Part 2 - Self Scaling Systems.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


The Black Magic of Systematically Reducing Linux OS Jitter

How do you systematically identify the sources of jitter and remove them one by one in a low-latency trading system? That was the question asked on the mechanical-sympathy email list. 

Gil Tene, Vice President of Technology and CTO, Co-Founder, Azul Systems gave the sort of answer that comes from the accumulated wisdom born from lots of real-life experience. It's an answer that needed sharing. And here it is:

Finding the cause of hiccups/jitters in a a Linux system is black magic. You often look at the spikes and imagine "what could be causing this". 

Based on empirical evidence (across many tens of sites thus far) and note-comparing with others, I use a list of "usual suspects" that I blame whenever they are not set to my liking and system-level hiccups are detected. Getting these settings right from the start often saves a bunch of playing around (and no, there is no "priority" to this - you should set them all right before looking for more advice...).

My current starting point for Linux systems that are interested in avoiding many-msec hiccup levels is:

Click to read more ...


What do we know about how Meerkat Works?

“The future is live. The future is real-time. The future is now.” I wrote that in 2010 about live video innovator (which pivoted into Five years later it appears the future is now once again.

 Meerkat has burst on the scene with a viral vengeance, so I became curious. Meerkat is throwing around a lot of live video. It must be chewing up cash at an early funding round crushing rate. How does it work?

Unfortunately, after digging deep, I found few specific details on their backend architecture. What do we know?

  • The cash burning surmise seems to be correct. Meerkat secured another $12 million in funding. The streams will continue to flow. Bandwidth is cheaper than it used to be, but it is still expensive. Aether Wu did made an estimate over on Quora: So let's consider a scale of 1m users online simultaneously. Every 20 minutes, it costs 100k gigabyte, which means $4k per hour/$96k per day/ $2.9m per month. So if we scale the business to 10 times bigger, it is about $1m per day?

  • Meerkat was three years in development by an Israeli based team of up to 11 developers.

  • Meerkat can handle thousands of live streams while maintaining good video quality. Perhaps implementation details were discussed in a Meerkat session, but, well, you know. Ben Rubin, the thoughtful and nearly ubiquitous founder of Meerkat, wrote on ProductHunt that “I'm in love with HLS. For Meerkat use case HLS is better despite the 10-15 sec delay as it giving advantages in stable, crystal clear quality. We can use it to shift stream to audio only when connection is low, or do all sort of tricks.” HLS is HTTP Live Streaming.

That’s about it. While the backend architecture remains a mystery, what I did find is still very interesting. It’s the story of how a team creatively hunts and sifts through a space until they create/find that perfect blending of features that makes people fall in love. Twitter did that. SnapChat did that. Now Meerkat has done it. How did they do it?


Click to read more ...


Stuff The Internet Says On Scalability For April 3rd, 2015

Hey, it's HighScalability time:

Luscious SpaceX photos have been launched under Creative Commons.
  • 1,000: age of superbug treatment; 18 million: number of laws in the US
  • Quotable Quotes:
    • @greenberg: Only in the Bay Area would you find a greeting card for closing a funding round.
    • @RichardWarburto: "Do Not Learn Frameworks. Learn the Architecture"
    • Alex Dzyoba: Know your data and develop a simple algorithm for it.
    • @BenedictEvans: Akamai: 17% of US mobile connections are >4 Mbps. Most of the rest of the developed world is over 50%
    • Linus: Linux is just a hobby, won’t be big and professional like GNU
    • jhugg: This just lines up with what we've seen in the KV space over the last 5 years. Mutating data and key-lookup are all well and good, but without a powerful query language and real index support, it's much less interesting.
    • Facebook: Whatever the scale of your engineering organization, developer efficiency is the key thing that your infrastructure teams should be striving for. This is why at Facebook we have some of our top engineers working on developer infrastructure.
    • mysticreddit: Micro-optimization is a complete waste of time when you haven't spent time focusing on the meta & macro optimization
    • @adriancolyer: If you think cross-partition transactions can't scale, it's well worth taking a look at the RAMP model: 
    • @jasongorman: Microservices are a great solution to a problem you probably don't have
    • @dbrady: If 1 service dies and your whole system breaks, you don't have SOA. You have a monolith whose brain has been chopped up and stuck in jars.

  • Fascinating realization. We live in a world in which every tech interaction is subject to a man-in-the-middle attack. Future Crimes: All of this is possible because the screens on our phones show us not reality but a technological approximation of it. Because of this, not only can the caller ID and operating system on a mobile device be hacked, but so too can its other features, including its GPS modules. That’s right, even your location can be spoofed.

  • That's every interaction. Pin-pointing China's attack against GitHub: The way the attack worked is that some man-in-the-middle device intercepted web requests coming into China from elsewhere in the world, and then replaced the content with JavaScript code that would attack GitHub. 

  • Messaging and mobile platforms: If you take all of this together, it looks like Facebook is trying not to compete with other messaging apps but to relocate itself within the landscape of both messaging and the broader smartphone interaction model. 

  • Martin Thompson: Love the point that the compiler can only solve problems in the 1-10% problem space. The 90% problem space is our data access which is all about data structures and algorithms. The summary is he shows how instruction processing can be dwarfed by cache misses. This resonates for me with what I've seen in the field with customers in the high-performance space. Obvious caveat is applications where time is dominated by IO.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


6 Ways to Defeat the Coming Robot Army Swarms

Phalanx Weapon System by EumenesOfCardia

For every new weapon there’s an often unexpected move made to counter it. You develop a rock, I develop a shield. You develop a castle, I develop a cannon. You develop a knight in shining armor, I develop a long bow. You develop nukes and I go MAD. You develop a hacker army, I corrupt technology.

You develop a swarming robot army, what do I do?

To answer that question Paul Scharre over on the War on the Rocks blog has written a mesmerizing 6 part series of articles on robotic swarm warfare (1, 2, 3, 4, 5, Counter-Swarm: A Guide to Defeating Robotic Swarms).

But first, let’s set up the fear part of the article.

War Has Gone Open-Source

Click to read more ...