Stuff The Internet Says On Scalability For May 8th, 2015

Hey, it's HighScalability time:

Not spooky at all. A 1,000 robot self-organizing flash mob.
  • 400 ppm: global CO2 concentration; 13.1 billion: distance in light-years of farthest galaxy
  • Quotable Quotes:
    • Pied Piper: It’s built on a universal compression engine that stacks on any file, data, video or image no matter what size.
    • Bokardo: 1 hour of research saves 10 hours of development time
    • @12Knocksinna: Microsoft uses Cassandra open source tech to help manage the 500+ million events generated by Office 365 hourly (along with SQL and Azure)
    • @antirez: Redis had a lot of client libs ASAP. By reusing the Redis protocol, Disque is getting clients even faster, and 2700 Github stars in 9 days!
    • @blueben: AWS Glacier seems like a great DR option until you realize it costs $180,000 to retrieve your 100TB archive in an emergency.
    • Peter Diamandis: The best way to become a billionaire is to solve a billion-person problem.
    • Cordkillers: YouTube visits up 40% from last year
    • @acroll: "It's about economics not innovation, otherwise we'd all be flying Concorde instead of Jumbo Jets." @JulieMarieMeyer #StrataHadoop
    • @DLoesch: Start time delayed because cable systems are overloaded due to PPV buys. Insane. Don't snooze, don't lose! #MayPac
    • grauenwolf: This is where unit test fanboys piss me off. They claim that they can't use integration tests because they are too slow. I claim that they need integration tests to find their slow queries.
    • nuclearqtip: The open source world needs a standardized trust model for binary artifacts. 
    • Greg Ferro: SDN and SNA are about as similar Model T Ford & any modern car. For the record, no drives a Model T Ford to work everyday. Stop comparing SDN to SNA. Its pointless.
    • Urs Hölzle: Now the decade of work we put into NoSQL is available to everyone using GCP.  One way it shows that we've been working on this longer than anyone else: 99% read latency is 6ms vs ~300ms for other systems.
    • Swardley: Cloud is not about saving money - never was. It's about doing more stuff with exactly the same amount of money. That can cause a real headache in competition. 
    • Johns Hopkins: scientists have discovered that neurons are risk takers: They use minor "DNA surgeries" to toggle their activity levels all day, every day. 

  • Tesla's Powerwall has already sold out. So will Tesla's next gigafactory be a terafactory or a petafactory?

  • Something to keep in mind when hiring: 21% of [NFL] Hall of Fame players were selected in the 4th round or later.

  • Move along, nothing to see here. Brett Slatkin: I wonder how long it will be before people realize that all of this server orchestration business is a waste of time? Ultimately, what you really want is to never think about systems like Borg that schedule processes to run on machines. That's the wrong level of abstraction. You want something like App Engine, vintage 2008 platform as a service, where you run a single command to deploy your system to production with zero configuration.

  • Can any product withstand Aphyr's Jepsen partition torture test? Aeropspike, Elasticsearch, MongoDB, RabbitMQ, Riak, Cassandra, Kafka, NuoDB, Postgres, Redis, all had problems when stress tested under network partitions. Not surprising really, as Aphyr says, "Distributed systems design is really hard." That we find problems in popular well regarded products indicates that "We need formal theory, written proofs, computer verification, and experimental demonstration that our systems make the tradeoffs we think they make. As systems engineers, we continually struggle to erase the assumption of safety before that assumption causes data loss or downtime. We need to clearly document system behaviors so that users can make the right choices. We must understand our systems in order to explain them–and distributed systems are hard to understand." gmagnusson has a good sense of things: "I admire the work that Aphyr does - though at the end of the day, I need to build systems that work for the problem I'm trying to solve (and I have to choose from real things that are available). These technologies in general are trying to address really hard problems and design and architecture is the art of balancing tradeoffs. Nothing is going to be perfect. Yet."  

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Varnish Goes Upstack with Varnish Modules and Varnish Configuration Language

This is a guest post by Denis Brækhus and Espen Braastad, developers on the Varnish API Engine from Varnish Software. Varnish has long been used in discriminating backends, so it's interesting to see what they are up to.

Varnish Software has just released Varnish API Engine, a high performance HTTP API Gateway which handles authentication, authorization and throttling all built on top of Varnish Cache. The Varnish API Engine can easily extend your current set of APIs with a uniform access control layer that has built in caching abilities for high volume read operations, and it provides real-time metrics.

Varnish API Engine is built using well known components like memcached, SQLite and most importantly Varnish Cache. The management API is written in Python. A core part of the product is written as an application on top of Varnish using VCL (Varnish Configuration Language) and VMODs (Varnish Modules) for extended functionality.

We would like to use this as an opportunity to show how you can create your own flexible yet still high performance applications in VCL with the help of VMODs.

VMODs (Varnish Modules)

Click to read more ...


Elements of Scale: Composing and Scaling Data Platforms

This is a guest repost of Ben Stopford's epic post on Elements of Scale: Composing and Scaling Data Platforms. A masterful tour through the evolutionary forces that shape how systems adapt to key challenges.

As software engineers we are inevitably affected by the tools we surround ourselves with. Languages, frameworks, even processes all act to shape the software we build.

Likewise databases, which have trodden a very specific path, inevitably affect the way we treat mutability and share state in our applications.

Over the last decade we’ve explored what the world might look like had we taken a different path. Small open source projects try out different ideas. These grow. They are composed with others. The platforms that result utilise suites of tools, with each component often leveraging some fundamental hardware or systemic efficiency. The result, platforms that solve problems too unwieldy or too specific to work within any single tool.

So today’s data platforms range greatly in complexity. From simple caching layers or polyglotic persistence right through to wholly integrated data pipelines. There are many paths. They go to many different places. In some of these places at least, nice things are found.

So the aim for this talk is to explain how and why some of these popular approaches work. We’ll do this by first considering the building blocks from which they are composed. These are the intuitions we’ll need to pull together the bigger stuff later on.

Click to read more ...


Stuff The Internet Says On Scalability For May 1st, 2015

Hey, it's HighScalability time:

Got containers? Gorgeous shot of the CSCL Globe (by Walter Scriptunas II), world's largest container ship: 1,313ft long; 19,000 standard containers.
  • $3000: Tesla's new 7kWh daily cycle battery.
  • Quotable Quotes:
    • @mamund: "Turns out there is nothing about HTTP that I like" --  Douglas Crockford 
    • @PeterChch: Your little unimportant site might be hacked not for your data but for your aws resources. E.g. bitcoin mining.
    • @Joseph_DeSimone: I find it stunning that Google's annual R&D budget totaled $9.8 billion and the Budget for the National Science Foundation was $7.3 billion
    • @jedberg: The new EC2 container service adds the missing granularity to #ec2
    • Randy Shoup: “Every service at Google is either deprecated or not ready yet.”  -- Google engineering proverb
    • @mtnygard: Today the ratio of admins to servers in a well-behaved scalable web companies is about 1 to 10,000. @botchagalupe #craftconf
    • @joshk: Data: There Are Over 9x More Private IPOs Than Actual Tech IPOs 
    • @nwjsmith: “Systems are not algorithms. Systems are much more complex.“ #CraftConf @skamille
    • kk: “Because the center of the universe is wherever there is the least resistance to new ideas.”
    • John Allspaw: Stop thinking that you’re trying to solve a troubleshooting problem; you’re not. Instead of telling me about how your software will solve problems, show me that you’re trying to build a product that is going to join my team as an awesome team member, because I’m going to think about using/buying your service in the same way that I think about hiring.
    • @mpaluchowski: "Netflix is a #logging system that happens to play movies." #CraftConf
    • John Wilke:  Resiliency is more important than performance.
    • @peakscale: The server/cattle metaphor rubs me the wrong way... all the farmers I knew and worked for named and cared about their herd.
    • @aphyr: "We've managed to run 40 services in prod for three years without needing to introduce a consensus system" @skamille, #CraftConf
    • @ryantomlinson: “Spotify have been using DNS for service discovery for a long time” #CraftConf
    • @csanchez: Google "we start over 2 billion containers per week" containers, containers, containers! #qconlondon 
    • @tyler_treat: If you're using RabbitMQ, consider replacing it with Kafka. Higher throughput, better replication, replayability. Same goes for other MQs.
    • @tastapod: @botchagalupe telling #CraftConf how it is! “Yelp is spinning up 8 containers a second. This is the real sh*t, man!”
    • @mpaluchowski: "A static #alert threshold won't be any good next week. It must be calculated." #CraftConf
    • @mtnygard: #craftconf @randyshoup “Microservices are an answer to a scaling problem, not a business problem.”  So right.
    • @adrianco: @mtnygard @randyshoup speed of development is the business problem that leads to Microservices.
    • @b6n: the aws financials should be a wake-up call to anyone still thinking cloud isn't a game of raw scale
    • @mtnygard: The “edge” used to be top-of-rack. Then the hypervisor. Now it’s the container. That’s 100x the number of IPs. — @botchagalupe #craftconf
    • @idajantis: 'An escalator can never break; it can only become stairs' - nice one by @viktorklang at #CraftConf on Distributed Systems failing gracefully
    • @jessitron: "You should store your data in a real database and replicate it to Elasticsearch." @aphyr #CraftConf

  • A telling difference between Google and Apple: Google Now becomes a more robust platform with 70 new partner apps. Apple takes an app-centric view of the world and Google not surprisingly takes a data centric view. With Google developers feed Google data for Google to display. With Apple developers feed Apple apps for users to consume. On Apple developers push their own brand and control functionality through bundled extensions, but Google will have the perspective to really let their deep learning prowess sing. So there's a real choice.

  • How appropriate that game theory is applied to cyberwarfare. Mutually Assured Destruction isn't just for nukes. Pentagon Announces New Strategy for Cyberwarfare: “Deterrence is partially a function of perception,” the new strategy says. “It works by convincing a potential adversary that it will suffer unacceptable costs if it conducts an attack on the United States, and by decreasing the likelihood that a potential adversary’s attack will succeed.

  • Reducing big data using ideas from quantum theory makes it easier to interpret. So maybe QM is nature's way of making sense of the BigData that is the Universe?

  • Synergy is not always BS. Cheaper bandwidth or bust: How Google saved YouTube: YouTube was burning through $2 million a month in bandwidth costs before the acquisition. What few knew at the time was that Google was a pioneer in data center technology, which allowed it to dramatically lower the costs of running YouTube.

  • In a winner take all market is the cost of customer acquisition pyrrhic? Uber Burning $750 Million in a Year.

  • The cloud behind the cloud. Apple details how it rebuilt Siri on Mesos: Apple’s custom Mesos scheduler is called J.A.R.V.I.S.; Apple uses J.A.R.V.I.S. as its internal platform-as-a-service; Apple’s Mesos cluster spans thousands of nodes and runs about a hundred services; Siri’s Mesos backend represents its third generation, and a move away from “traditional” infrastructure.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Paper: DNACloud: A Tool for Storing Big Data on DNA

"From the dawn of civilization until 2003, humankind generated five exabytes (1 exabytes = 1 billion gigabytes) of data. Now we produce five exabytes every two days and the pace is accelerating."

-- Eric Schmidt, Executive Chairman, Google, August 4, 2010. 


Where are we going to store the deluge of data everyone is warning us about? How about in a DNACloud that can store store 1 petabyte of information per gram of DNA?

Writing is a little slow. You have to convert your data file to a DNA description that is sent to a biotech company that will send you back a vile of synthetic DNA. Where do you store it? Your refrigerator.

Reading is a little slow too. The data can apparently be read with great accuracy, but to read it you have to sequence the DNA first, and that might take awhile.

The how of it is explained in DNACloud: A Tool for Storing Big Data on DNA (poster). Abstract:

The term Big Data is usually used to describe huge amount of data that is generated by humans from digital media such as cameras, internet, phones, sensors etc. By building advanced analytics on the top of big data, one can predict many things about the user such as behavior, interest etc. However before one can use the data, one has to address many issues for big data storage. Two main issues are the need of large storage devices and the cost associated with it. Synthetic DNA storage seems to be an appropriate solution to address these issues of the big data. Recently in 2013, Goldman and his collegues from European Bioinformatics Institute demonstrated the use of the DNA as storage medium with capacity of storing 1 peta byte of information on one gram of DNA and retrived the data successfully with low error rate [1]. This significant step shows a promise for synthetic DNA storage as a useful technology for the future data storage. Motivated by this, we have developed a software called DNACloud which makes it easy to store the data on the DNA. In this work, we present detailed description of the software.

 Related Articles


Sponsored Post: OpenDNS, MongoDB, Internap, Aerospike, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • The Cloud Platform team at OpenDNS is building a PaaS for our engineering teams to build and deliver their applications. This is a well rounded team covering software, systems, and network engineering and expect your code to cut across all layers, from the network to the application. Learn More

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • How to Get a Game-Changing Performance Advantage with Intel SSDs and Aerospike. Presenter: Frank Ober, Data Center Solution Architect at Intel Corporation. Wednesday, May 13, 2015 @ 10:00AM PST, 1:00PM PST. Learn how to maximize the price/performance of your Intel Solid-State Drives (SSDs) with Aerospike. Frank Ober of Intel’s Solutions Group will review how he achieved 1+ million transactions per second on a single dual socket Xeon Server with SSDs using the open source tools of Aerospike for benchmarking. Register Now.

  • MongoDB World brings together over 2,000 developers, sysadmins, and DBAs in New York City on June 1-2 to get inspired, share ideas and get the latest insights on using MongoDB. Organizations like Salesforce, Bosch, the Knot, Chico’s, and more are taking advantage of MongoDB for a variety of ground-breaking use cases. Find out more at but hurry! Super Early Bird pricing ends on April 3.

Cool Products and Services

  • SQL for Big Data: Price-performance Advantages of Bare Metal. When building your big data infrastructure, price-performance is a critical factor to evaluate. Data-intensive workloads with the capacity to rapidly scale to hundreds of servers can escalate costs beyond your expectations. The inevitable growth of the Internet of Things (IoT) and fast big data will only lead to larger datasets, and a high-performance infrastructure and database platform will be essential to extracting business value while keeping costs under control. Read more.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • Benchmark: MongoDB 3.0 (w/ WiredTiger) vs. Couchbase 3.0.2. Even after the competition's latest update, are they more tired than wired? Get the Report.

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...


How can we Build Better Complex Systems? Containers, Microservices, and Continuous Delivery.

We must be able to create better complex software systems. That’s that message from Mary Poppendieck in a wonderful far ranging talk she gave at the Craft Conference: New New Software Development Game: Containers, Micro Services.

The driving insight is complexity grows nonlinearly with size. The type of system doesn’t really matter, but we know software size will continue to grow so software complexity will continue to grow even faster.

What can we do about it? The running themes are lowering friction and limiting risk:

  • Lower friction. This allows change to happen faster. Methods: dump the centralizing database; adopt microservices; use containers; better organize teams.

  • Limit risk. Risk is inherent in complex systems. Methods: PACT testing; continuous delivery.

Some key points:

  • When does software really grow? When smart people can do their own thing without worrying about their impact on others. This argues for building federated systems that ensure isolation, which argues for using microservices and containers.

  • Microservices usually grow successfully from monoliths. In creating a monolith developers learn how to properly partition a system.

  • Continuous delivery both lowers friction and lowers risk. In a complex system if you want stability, if you want security, if you want reliability, if you want safety then you must have lots of little deployments. 

  • Every member of a team is aware of everything. That's what makes a winning team. Good situational awareness.

The highlight of the talk for me was the section on the amazing design of the Swedish Gripen Fighter Jet. Talks on microservices tend to be highly abstract. The fun of software is in the building. Talk about parts can be so nebulous. With the Gripen the federated design of the jet as a System of Systems becomes glaringly concrete and real. If you can replace your guns, radar system, and virtually any other component without impacting the rest of the system, that’s something! Mary really brings this part of the talk home. Don’t miss it.

It’s a very rich and nuanced talk, there’s a lot history and context given, so I can’t capture all the details, watching the video is well worth the effort. Having said that, here’s my gloss on the talk...

Hardware Scales by Abstraction and Miniaturization

Click to read more ...


Stuff The Internet Says On Scalability For April 17th, 2015

Hey, it's HighScalability time:

A fine tribute on Silicon Valley & hilarious formula evaluating Peter Gregory's positive impact on humanity.

  • 118/196: nations becoming democracies since mid19th century; $70K: nice minimum wage; 70 million: monthly StackExchange visitors; 1 billion: drone planted trees; 1,000 Years: longest-exposure camera shot ever

  • Quotable Quotes:

    • @DrQz: #Performance modeling is really about spreading the guilt around.

    • @natpryce: “What do we want?” “More levels of indirection!” “When do we want it?” “Ask my IDateTimeFactoryImplBeanSingletonProxy!”

    • @BenedictEvans: In the late 90s we were euphoric about what was possible, but half what we had sucked. Now everything's amazing, but we worry about bubbles

    • Calvin Zito on Twitter: "DreamWorks Animation: One movie, 250 TB to make.10 movies in production at one time, 500 million files per movie. Wow."

    • Twitter: Some of our biggest MySQL clusters are over a thousand servers.

    • @SaraJChipps: It's 2015: open source your shit. No one wants to steal your stupid CRUD app. We just want to learn what works and what doesn't.

    • Calvin French-Owen: And as always: peace, love, ops, analytics.

    • @Wikipedia: Cut page load by 100ms and you save Wikipedia readers 617 years of wait annually. Apply as Web Performance Engineer

    • @IBMWatson: A person can generate more than 1 million gigabytes of health-related data.

    • @allspaw: "We’ve learned that automation does not eliminate errors." (yes!)  

    • @Obdurodon: Immutable data structures solve everything, in any environment where things like memory allocators and cache misses cost nothing.

    • KaiserPro: Pixar is still battling with lots of legacy cruft. They went through a phase of hiring the best and brightest directly from MIT and the like.

    • @Obdurodon: Immutable data structures solve everything, in any environment where things like memory allocators and cache misses cost nothing.

    • @abt_programming: "Duplication is far cheaper than the wrong abstraction" - @sandimetz

    • @kellabyte: When I see places running 1,200 containers for fairly small systems I want to scream "WHY?!"

    • chetanahuja: One of the engineers tried running our server stack on a raspberry for a laugh.. I was gobsmacked to hear that the whole thing just worked (it's a custom networking protocol stack running in userspace) if just a bit slower than usual.

  • Chances are if something can be done with your data, it will be done. @RMac18: Snapchat is using geofilters specific to Uber's headquarter to poach engineers.

  • Why (most) High Level Languages are Slow. Exactly this by masterbuzzsaw: If manual memory management is cancer, what is manual file management, manual database connectivity, manual texture management, etc.? C# may have “saved” the world from the “horrors” of memory management, but it introduced null reference landmines and took away our beautiful deterministic C++ destructors.

  • Why NFS instead of S3/EBS? nuclearqtip with a great answer: Stateful; Mountable AND shareable; Actual directories; On-the-wire operations (I don't have to download the entire file to start reading it, and I don't have to do anything special on the client side to support this; Shared unix permission model; Tolerant of network failures Locking!; Better caching ; Big files without the hassle.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Paper: Large-scale cluster management at Google with Borg

Joe Beda (@jbeda): Borg paper is finally out. Lots of reasoning for why we made various decisions in #kubernetes. Very exciting.

The hints and allusions are over. We now have everything about Google's long rumored Borg project in one iconic Google style paper: Large-scale cluster management at Google with Borg.

When Google blew our minds by audaciously treating the Datacenter as a Computer it did not go unnoticed that by analogy there must be an operating system for that datacenter/computer.

Now we have the story behind a critical part of that OS:

Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.

It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.

We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.

Virtually all of Google’s cluster workloads have switched to use Borg over the past decade. We continue to evolve it, and have applied the lessons we learned from it to Kubernetes

The next version of Borg was called Omega and Omega is being rolled up into Kubernetes (steersman, helmsman, sailing master), which has been open sourced as part of Google's Cloud initiative.

Note how the world has changed. A decade ago when Google published their industry changing Big Table and Map Reduce papers they launched a thousand open source projects in response. Now we are not only seeing Google open source their software instead of others simply copying the ideas, the software has been released well in advance of the paper describing the software.

The future is still in balance. There's a huge fight going on for the future of what software will look like, how it is built, how it is distributed, and who makes the money. In the search business keeping software closed was a competitive advantage. In the age of AWS the only way to capture hearts and minds is by opening up your software. Interesting times.

Related Articles


Full Stack Tuning for a 100x Load Increase and 40x Better Response Times

A world that wants full stack developers also needs full stack tuners. That tuning process, or at least the outline of a full stack tuning process is something Ronald Bradford describes in not quite enough detail in Improving performance – A full stack problem.

The general philosophy is:

  • Understanding were to invest your energy first, know what the return on investment can be.
  • Measure and verify every change.

He lists several tips for general website improvements:

Click to read more ...