Stuff The Internet Says On Scalability For January 1st, 2016

Hey, Happy New Year, it's HighScalability time:

River system? Vascular system? Nope. It's a map showing how all roads really lead to Rome.


If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • 71: mentions of innovation by the Chinese Communist Party; 60.5%: of all burglaries involve forcible entry; 280,000-squarefoot: Amazon's fulfillment center in India capable of shipping 2 million items; 11 billion: habitable earth like planets in the goldilocks zone in just our galaxy; 800: people working on the iPhone's camera (how about the app store?); 3.3 million: who knew there were so many Hello Kitty fans?; 26 petabytes: size of League of Legends' data warehouse; 

  • Quotable Quotes:
    • George Torwell: Tor is Peace / Prism is Slavery / Internet is Strength
    • @SciencePorn: Mr Claus will eat 150 BILLION calories and visit 5,556 houses per second this Christmas Eve.
    • @SciencePorn: Blue Whale's heart is so big, a small child can swim through the veins.
    • @BenedictEvans: There are close to 4bn people on earth with a phone (depending on your assumptions). Will go to at least 5bn. So these issues will grow.
    • @JoeSondow: "In real life you won't always have a calculator with you." — math teachers in the 80s
    • James Hamilton: This is all possible due to the latencies we see with EC2 Enhanced networking. Within an availability zone, round-trip times are now tens of microseconds, which make it feasible to propose and commit transactions to multiple resilient nodes in less than a millisecond.
    • Benedict Evans: The mobile ecosystem, now, is heading towards perhaps 10x the scale of the PC industry, and mobile is not just a new thing or a big thing, but that new generation, whose scale makes it the new centre of gravity of the tech industry. Almost everything else will orbit around it. 
    • Ruth Williams: Bacteria growing in an unchanging environment continue to adapt indefinitely.
    • @Raju: Not one venture-backed news aggregator has yet shown a Sustainable Business Model
    • @joeerl: + choose accurate names + favor beauty over performance + design minimal essential API's + document the unobvious
    • @shibuyashadows: There is no such thing as a full-node anymore. Now there are two types: Mining Nodes Economic Nodes. Both sets are now semi-centralized on the network, are heavily inter-dependent and represent the majority of the active Bitcoin users.
    • @TheEconomist: In 1972 a man with a degree aged 25-34 earned 22% more than a man without. Today, it's 70%
    • Dr. David Miller~ We are in the age of Howard Hughes. People make their fortune elsewhere and spend it on space. 
    • Credit for CRISPR: Part of that oversimplification is rooted in the fact that most modern life-science researchers aren’t working to uncover broad biological truths. These days the major discoveries lie waiting in the details
    • @BenedictEvans: Idle observation: Facebook will almost certainly book more revenue in 2015 than the entire internet ad industry made up until to 2000
    • Eric Clemmons: Ultimately, the problem is that by choosing React (and inherently JSX), you’ve unwittingly opted into a confusing nest of build tools, boilerplate, linters, & time-sinks to deal with before you ever get to create anything.
    • Kyle Russell: Why do I need such a powerful PC for VR? Immersive VR experiences are 7x more demanding than PC gaming.
    • @josevalim: The system that manages rate limits for Pinterest written in Elixir with a 90% response time of 800 microseconds.
    • catnaroek: The normal distribution is important because it arises naturally when the preconditions of the central limit theorem hold. But you still have to use your brain - you can't unquestioningly assume that any random variable (or sample or whatever) you will stumble upon will be approximately normally distributed.
    • Dominic Chambers: Now, if you consider the server-side immutable state atom to be a materialized view of the historic events received by a server, you can see that we've already got something very close to a Samza style database, but without the event persistence.
    • Joscha Bach: In my view, the 20th century’s most important addition to understanding the world is not positivist science, computer technology, spaceflight, or the foundational theories of physics. It is the notion of computation. Computation, at its core, and as informally described as possible, is very simple: every observation yields a set of discernible differences.

  • The New Yorker is picking up on the Winner Takes All theme that's been developing, I guess that makes it an official meme. What's missing from their analysis is that users are attracted to the eventual winners because they provide a superior customer experience. Magical algorithms are in support of experience. As long as a product doesn't fail at providing that experience there's little reason to switch after being small networked into a choice. You might think many many products could find purchase along the long tail, but in low friction markets that doesn't seem to be the case. Other choices become invisible and what's invisible starves to death.

  • I wonder how long it took to get to the 1 billionth horse ride? Uber Hits One Billionth Ride in 5.5 years.

  • Let's say you are a frog that has been in a warming pot for the last 15 years, what would you have missed? Robert Scoble has put together quite a list. 15 years ago there was no: Facebook, YouTube, Twitter, Google+, Quora, Uber, Lyft, iPhone, iPads, iPod, Android, HDTV, self driving cars, Waze, Google Maps, Spotify. Soundcloud, WordPress, Wechat, Flipkart, AirBnb, Flipboard, LinkedIn, AngelList, Techcrunch, Google Glass, Y Combinator, Techstars, Geekdom, AWS, OpenStack, Azure, Kindle, Tesla, and a lot more.

  • He who controls the algorithm reaps the rewards. Kansas is now the 5th state where lottery prizes may have been fixed.

  • What Is The Power Grid? A stunning 60% of generated energy is lost before it can be consumed, which is why I like my power grids like my databases: distributed and shared nothing.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


How to choose an in-memory NoSQL solution: Performance measuring

The main purpose of this work is to show results of benchmarking some of the leading in-memory NoSQL databases with a tool named YCSB.

We selected three popular in-memory database management systems: Redis (standalone and in-cloud named Azure Redis Cache), Tarantool and CouchBase and one cache system Memcached. Memcached is not a database management system and does not have persistence. But we decided to take it, because it is also widely used as a fast storage system. Our “firing field” was a group of four virtual machines in Microsoft Azure Cloud. Virtual machines are located close to each other, meaning they are in one datacenter. This is necessary to reduce the impact of network overhead in latency measurements. Images of these VMs can be downloaded by links: one, two, three and four (login: nosql, password: qwerty). A pair of VMs named nosql-1 and nosql-2 is useful for benchmarking Tarantool and CouchBase and another pair of VMs named nosql-3 and nosql-4 is good for Redis, Azure Redis Cache and Memcached. Databases and tests are installed and configured on these images.

Our virtual machines were the basic A3 instances with 4 cores, 7 GB RAM and 120 GB disk size.

Databases and their configurations

Click to read more ...


Using AWS Lambda functions to create print ready files

This is a guest post by Thiago Wolff from Peecho.

In a nutshell, Peecho is all about turning your digital content into professionally printed products. Although it might look like a simple task, a lot of stuff happens behind the scenes to make that possible. In this article, we’re going to tell you about our  processing architecture as well as at a recent performance improvement with the integration of AWS Lambda functions.

Print-ready files

In order to make digital content ready for printing facilities, there are some procedures that must occur after the order is received and before the final printing. In printing industry this process is called pre-press and the Peecho platform fully automates its initial stages before routing orders to printers.

Once the file has been created by the customer and uploaded to Peecho, it undergoes our processing stage. During processing, the file is checked to make sure it contains all the elements necessary for a successful print run: do the images have the proper format and resolution, are all the fonts included, are the RGB/CMYK colors set up appropriately, are all layout elements such as margins, crop marks and bleeds set up correctly, etc.

All these checks are automated by our backend systems. The entire process is quite complex and involves heavy computational activities to be executed that are expensive and time consuming. Let’s take a more detailed look at our processing architecture.

Processing Architecture

The processing stage starts right after an order is placed and payment has been confirmed. It’s initiated by the order intake server by adding a message to a SQS processing queue with all information about the order and file to be processed. Whenever there is a message available in the queue, a new processing machine (a large EC2 instance) starts working to transform the original data into a print ready file.

At the core of the processing code we use open source libraries like iText as well as third party software for PDF and image encoding/conversion like PStill and ImageMagick. As the result of processing we generate PDF/X-3 files.

In earlier versions, when the Peecho platform first launched, all processing was executed by EC2 instances. For a single order it was done sequentially; page by page as illustrated below.

Since we can deal with any kind of files and usually really tough ones, the described transformation process could take hours to be executed. In average, it would take 15 seconds per page. Since it needed to be done sequentially, the processing time increased linearly according to the number of pages. For example, a 400-page document would take around 1 hour and 40 minutes to be processed, which  is a considerable amount of time for a single file.

Recently, our development team has integrated the new AWS Lambda functions into the processing architecture and that has changed the story enormously.

AWS Lambda

Imagine if you could simply define a piece of code that runs in a dedicated machine in the cloud, without worrying about provisioning, managing and scaling the servers that you use to run the code? That’s exactly what AWS Lambda is: a compute service where you can define functions that respond to events, such as changes to data in Amazon S3.

In the new processing architecture, we took the existing processing code and converted it into a AWS Lambda function that performs all file transformations on a single page in a document. The new function is written in Node.js and is triggered after S3 file uploads.

After the processing starts, the original document is split into separate pages and uploaded to S3; when the upload completes for every page, a new Lambda instance is launched and starts cracking the page data.

By doing that, we are now able to run a separate processing instance for each page in parallel. It means that for a 400-page document we now launch 400 Lambda instances simultaneously and process the entire document at the same period of time it would take to process a single page. Therefore, the processing time does not increase with the number of pages. And as a result, we can process almost any document in the same time we used to process a single page!

Although AWS Lambda is a great and powerful function, it has some limitations regarding execution time, disk space and memory. For instance, we are not able to use Lambda to process files larger than 500MB. Since we still have to process these big guys, the Peecho platform falls back to the previous mechanism whenever we need to handle corner cases like that.

More on Lambda

Other than document processing, Peecho also uses AWS Lambda functions in some other cool features like the generation of thumbnails for publication covers as well as content previews. For that, Lambda functions are triggered right after a publication is uploaded, so image thumbnails are instantly available in our dashboard, website and checkout pages.

Our development team is obsessed in making things simpler and faster. We are continuously seeking new possibilities for improving performance across Peecho applications. When it comes to that, AWS Lambda function makes a great fit and it’s definitely going to be more and more explored in future releases.

Stuff The Internet Says On Scalability For December 18th, 2015

Hey, it's HighScalability time:

In honor of a certain event what could be better than a double-bladed lightsaber slicing through clouds? (ESA/Hubble & NASA)


If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • 66,000 yottayears: lifetime of an electron; 3 Gbps: potential throughput for backhaul microwave networks; 1.2 trillion: yearly Google searches; $100 trillion: global investible capital; 2.5cm: range of chip powered by radio waves; 

  • Quotable Quotes:
    • @KarenMN: He's making a database / He's sorting it twice / SELECT * from contacts WHERE behavior = 'nice' / SQL Clause is coming to town
    • abrkn: Every program attempts to expand until it has an app store. Those programs which cannot so expand are replaced by ones which can.
    • Amin Vahdat: Some recent external measurements indicate that our [Google] backbone carries the equivalent of 10 percent of all the traffic on the global Internet. The rate at which that volume is growing is faster than for the Internet as a whole.
    • Prismatic:  we also learned content distribution is a tough business and we’ve failed to grow at a rate that justifies continuing to support our Prismatic News products.
    • On General Pershing: Pershing was the way he was because he knew that winning wars was in the details. Troops who paid attention to the small things would master the big things. 
    • jbob2000: Wow! A single developer working on small websites doesn't need MVC? What a revelation! I bet he doesn't have any pesky problems, such as; working in large teams, long term support, developer turn over, documentation, changing requirements, deadlines, scaling, etc. etc. Oh, but the rendered HTML looks nice!
    • Poldrack: That was totally unexpected, but it shows that being caffeinated radically changes the connectivity of your brain
    • @ValaAfshar: Uber is less than 6 years old and now valued more than 80% of S&P 500 companies.
    • @HNTitles: Scaling Pinterest - From 0 to Startup: How We Use That. What startups use to prevent concussions
    • @Carnage4Life: Top 5 qualities of successful teams at Google 1 Failure is OK 2 Dependability 
      3 Clear structure 4 Meaning 5 Impact
    • Ustun Ozgur: The tides have changed there too. Now, you need just two endpoints: One for serving the initial HTML, one for the API endpoints. This is the essence of web programming in the future: Two endpoints to rule them all.
    • @ErlangerNick: US: 1 brewery per 78k people, 10 new breweries per week. UK: 1 brewery per 50k people, 15 new breweries per week when scaling populations.
    • jerf: When rewriting something, you should generally strive for a drop-in replacement that does the same thing, in some cases, even matching bug-for-bug
    • @EricMinick: "We found that where code deployments are most painful, you’ll find the poorest IT performance... and culture" - 2015 Puppet State of DevOps
    • @StartupLJackson: I'm going on the record to say the killer app for Bitcoin is not turning $1 of electricity into $.50 of BTC. 
    • @nntaleb: Paris blokes missed the point that it is not just temp rising, but its volatility rising more than the average! 2nd order effect=fragility
    • Julian Dunn: Unfortunately, I believe that the “large attack surface” is a fundamental design problem with containers being an evolutionary, not a revolutionary step from VMs and bare metal.
    • The Shade Tree Developer: sharing a database is like drug abusers sharing needles.
    • Joe Young: Keurig coffee machines are the bane of my trade. They are not built to last, some rarely make it a year in our business. They have no replaceable parts, so I can not fix them.
    • wh-uws: This is why slack is winning. They took many of the concepts of what makes irc great abd put a much better user experience on top. Why is that so hard for people to understand?
    • @chamath: New VC dynamics: Returns being generated by new firms. Legacy firms increasingly dated and out of touch. 

  • The Talk Show interviewed Apple senior vice president of software engineering Craig Federighi about Swift. The upshot wasn't anything technical, it was a feeling: If you were worried that Apple is going to dangle Swift, get you pot committed, and then pull it out from under you, that seems highly unlikely. It's clear from the interview Apple is using Swift, they are excited about Swift, and it's here to stay. Plan accordingly. John Siracusa is dead on in his discussion of garbage collection. Swift is using ARC instead of garbage collection, which is a bet on determinism winning over virtual machine based language approaches, which is a good bet IMHO, even in the age of more powerful mobile processors.

  • Elon Musk’s Billion-Dollar AI Plan Is About Far More Than Saving the World. Those AIs are so clever. How do you distribute AIs as deep and wide into society as possible? You make it free and open! That's how the AIs are going to take over, riding the open source meme to victory. 

  • It's odd how in software we try to reduce coupling at all costs, yet in biology every opportunity to communicate and create feedback loops is exploited. Maybe it's we who are doing it wrong? Cells send tiny parcels to each other: cells package various molecules into tiny bubble-like parcels called extracellular vesicles to send important messages - in sickness and health.

  • Now that's disaster planning! Elon Musk worries third World War would ruin Mars mission.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


How Does the Use of Docker Effect Latency?

A great question came up on the mechanical-sympathy list that many others probably have as well: 

I keep hearing about [Docker] as if it is the greatest thing since sliced bread, but I've heard anecdotal evidence that low latency apps take a hit. 

Who better to answer than Gil Tene, Vice President of Technology and CTO, Co-Founder, of Azul Systems? Like Stephen Curry draining a deep transition three, Gil can always be counted on for his insight:

And here's Gil's answer:

Putting aside questions of taste and style, and focusing on the effects on latency (the original question), the analysis from a pure mechanical point of view is pretty simple: Docker uses Linux containers as a means of execution, with no OS virtualization layer for CPU and memory, and with optional (even if default is on) virtualization layers for i/o. 

CPU and Memory

From a latency point of view, Docker's (and any other Linux container's) CPU and memory latency characteristics are pretty much indistinguishable from Linux itself. But the same things that apply to latency behavior in Linux apply to Docker.

If you want clean & consistent low latency, you'll have to the same things you need to do on non-dockerized and non-containerized Linux for the same levels of consistency. E.g. if you needed to keep the system as a whole under control (no hungry neighbors), you'll have to do that at the host level for Docker as well.

If you needed to isolate sockets or cores and choose which processes end up where, expect to do the same for your docker containers and/or the threads within them.

If you were numactl'ing or doing any sort of directed numa-driven memory allocation, the same will apply.

And some of the stuff you'll need to do may seem counter-style to how some people want to deploy docker, but if you are really interested in consistent low latency, you'll probably need to break out the toolbox and use the various cgroups, tasksets and other cool stuff to assert control over how things are laid out. But if/when you do, you won't be able to tell the difference (in terms of CPU and memory latency behaviors) between a dockeriz'ed process and one that isn't.


Disk I/O

I/O behavior under various configurations is where most of the latency overhead questions (and answers) usually end up. I don't know enough about disk i/o behaviors and options in docker to talk about it much. I'm pretty sure the answer to anything throughput and latency sensitive for storage will be "bypass the virtualization and volumes stuff, and provide direct device access to disks and mount points".


The networking situation is pretty clear: If you want one of those "land anywhere and NAT/bridge with some auto-generated networking stuff" deployments, you'll probably pay dearly for that behavior in terms of network latency and throughput (compared to bare metal dedicated NICs on normal linux). However, there are options for deploying docker containers (again, may be different from how some people would like to deploy things) that provide either low-overhead or essentially zero-latency-overhead network links for docker. Start with host networking and/or use dedicated IP addresses and NICs, and you'll do much better than the bridged defaults. But you can go to things like Solarflare's NICs (which tend to be common in bare metal low latency environments already), and even do kernel bypass, dedicated spinning-core network stack things that will have a latency behavior no different (on Docker) than if you did the same on bare metal Linux.


Docker (which is "userland as a unit") is not about packing lots of thing into a box. Neither is guest-OS-as-a-unit virtualization. Sure, they can both be used for that (and often are), but the biggest benefit they both give is the ability to ship around a consistent, well captured configuration. And the ability to develop, test, and deploy that exact same configuration. This later turns into being able to easily manage deployment and versioning (including roll backs), and being able to do cool things like elastic sizing, etc. There are configuration tools (puppet/chef/...) that can be used to achieve similar results on bare metal as well, of course (assuming they truly control everything in your image), but the ability to pack up your working stuff as a bunch of bits that can "just be turned on" is a very appealing.

I know people who use virtualization even with a single guest-per-host (e.g. an AWS r3.8xlarge instance type is probably that right now). And people who use docker the same way (single container per host). In both cases, it's about configuration control and how things get deployed, and not at all about packing things in a smaller footprint.

The low latency thing then becomes a "does it hurt?" question. And Docker hurts a lot less than hypervisor or KVM based virtualization does when it comes to low latency, and with the right choices for I/O (dedicated NICs, cores, and devices), it becomes truly invisible.

On HackerNews


Does AMP Counter an Existential Threat to Google?

When AMP (Accelerated Mobile Pages) was first announced it was right inline with Google’s long standing project to make the web faster. Nothing seemingly out of the ordinary.

Then I listened to a great interview on This Week in Google with Richard Gingras, Head of News at Google, that made it clear AMP is more than just another forward looking initiative from Google. Much more.

What is AMP? AMP is two things. AMP is a restricted subset of HTML designed to make the web fast on mobile devices. AMP is also a strategy to counter an existential threat to Google: the mobile web is in trouble and if the mobile web is in trouble then Google is in trouble.

In the interview Richard says (approximately):

The alternative [to a strong vibrant community around AMP] is devastating. We don’t want to see a decline in the viability of the mobile web. We don’t want to see poor experiences on the mobile web propel users into proprietary platforms.

This point, or something very like it, is repeated many times during the interview. With ad blocker usage on the rise there’s a palpable sense of urgency to do something. So Google stepped up and took leadership in creating AMP when no one else was doing anything that aligned with the principles of the free and open web.

The irony for Google is that advertising helped break the web. We have fouled our own nest.

Why now? Web pages are routinely between 2MB and 10 MB in size for only 80K worth of content. The blimpification of web pages comes from two general sources: beautification and advertising. Lots of code and media are used to make the experience of content more compelling. Lots of code and media are used in advertising.

The result: web pages have become very very slow. And a slow web is a dead web, especially in the parts of the world without fast or cheap mobile networks, which is much of the world. For many of these people the Internet consists of their social network, not the World Wide Web, and that’s not a good outcome for lots of people, including Google. So AMP wants to make people fall in love with the web again by speeding it up using a simple, cachable, and open format.

Does AMP work? Pinterest found AMP pages load four times faster and use eight times less data than traditional mobile-optimized pages. So, yes.

Is AMP being adopted? Seems like it.  Some of those on board are: WordPress, Nuzzle, LinkedIn, Twitter. Fox News, The WSJ, The NYT, Huffington Post, BuzzFeed, The Washington Post, BBC, The Economist, FT, Vox Media, LINE, Viber, and Tango, comScore, Chartbeat, Google Analytics,, Network18, and many more. Content publishers clearly see value in the survival of the web. Developers like AMP too. There are over 4500 developers on the AMP GitHub project.

When will AMP start? Google will reportedly send traffic to AMP pages in Google Search starting in late February, 2016.

Will Google advantage AMP in search results? Not directly says Google, but since faster sites rank better, AMP will implicitly rank higher compared to heavier weight content. We may have a two tiered web: the fast AMP based web and the slow bloated traditional web. Non AMP pages can still be made fast of course, but all of human history argues against it.

The AMP talk featured a well balanced panel representing a wide variety of interests. Leo Laporte, famous host and founder of TWiT, represents the small content publisher. He views AMP with a generally positive yet skeptical eye. AMP is open source, but it is still controlled by Google, so is the web still the open web? Jeff Jarvis is a journalism professor and a long time innovative thinker on how journalism can stay alive in the modern era. Jeff helped inspire the idea of AMP and sees AMP as a way publishers can distribute content to users on whatever form of media users are consuming. Kevin Marks is as good a representative for the free and open web as you could ask for. Matt Cutts as a very early employee at Google is of course pro Google, but he’s also represents an engineering perspective. Richard Gingras is the driving force behind AMP at Google. He’s also a compelling evangelist for AMP and the need for a true new Web 2.0.

Here’s a gloss of the discussion. I’m not attributing who said what, just the outstanding points that help reveal AMP’s vision for the future of the open web:

Origin Story

Click to read more ...


Stuff The Internet Says On Scalability For December 11th, 2015

Hey, it's HighScalability time:

Cheesy Star Trek graphics? Nope. It's hot gas streaming into Pandora’s Cluster.


If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.

  • 100 millionJohn Henry as played by a conventional computer loses to a quantum computer; 400,000: cores in PayPal's OpenStack deployment; 10TB: max size of Google Cloud SQL database; 9%: Kickstarter projects that don't deliver; $2.3 trillion: worth of The Forbes 400 members; billions: worth of Spanish treasure ship;

  • Quotable Quotes:
    • Pandalicious: I actually expect that down the road most large open source projects will start distributing a standardized build environment via docker containers. 
    • @glasnt: "Optimise for speed flexibility & evolution" "Whoever is iterating faster has a huge advantage" - @adrianco #yow15 
    • @erikbryn: LIDAR goes from $75K to $500, leaves Moore's Law in the Dust
    • Henry Miller: One has to believe wholeheartedly in what one is doing, realize that it is the best one can do at the moment—forego perfection now and always!—and accept the consequences which giving birth entails.
    • @jedws: "uber is way more reliable on Saturday and Sunday because there are no engineers working on the.system" #yow15
    • @samkottle: "Waffles are like kubernetes on a dish" -@rbranson
    • @brian_klaas: No server is easier to manage than no server, but are we moving all the complexity to the front-end?
    • @Carnage4Life: Death of #unbundling part 2: Facebook shutting down lab which shipped side apps like Hello, Rooms & Slingshot 
    • @carlosfairgray: Efforts to drive uncertainty out of development have only driven innovation out of development. #yow15 @DReinertsen 
    • : “Let’s legislate secure cryptographic backdoors” is the 21st century’s “let’s pass a law to make π = 3”
    • @jessitron: To call an API, or just grab it from the database? Don't tap into another team at the spine. Talk to their faces.
    • Brian Chesky: One of the keys to get to scale, is to do things that don’t scale. One other important lesson within this lesson is — 100 customers who love you > 1,000,000 users.
    • IbanezDavy: The areas of where we expect quantum computers to be faster are roughly known. There are cases where classical computers will still perform better than a quantum computer. But D Wave has been criticized of not truly having a quantum computer, so I think they are motivated in just demonstrating that they do indeed have one.
    • @tiagogriffo: "We developed the product so fast that marketing had not time to change the requirements" said a PM. From @DReinertsen talk at #yow15
    • @xaprb: push 10,000 metrics/sec at 1-sec resolution for 1000 servers for a year and see if it scales forever ;-)

  • Apple has open sourced Swift for reals, not just a code dump months too late to be of use. Swift is on github, you can look at the code, see the entire version history from the very first check-in, see what's changing, contribute, file bugs, etc. So it's a real open source project. Apple is even porting key frameworks like their Foundation libraries over to Swift. If you are looking for the one language to rule them all, that can run fast enough on the server, be used for web apps, and run on mobile, Swift is making the case for being that language, which is no doubt what Apple also wants it for. Incentives align. Expect developers to quickly fillout the tool chain. How does Swift compare? Go vs Node vs Rust vs Swift. Swift is fast, but lacks language primitives for parallelism. 

  • Ruby can be much faster. 25,000+ Req/s for Rack JSON API with MRuby~ MRuby is a minimal version of Ruby, that can be embedded in any system that supports C...There is a new HTTP web server called H2O, which is really, really fast...When H2O is compiled, it embeds a MRuby interpreter that can be used to run Ruby code. The result: an astonishing: 28,000+ requests per second.

  • Fox guarding the chickens. U.S. states pass laws backing Uber’s view of drivers as contractors.

  • In the same way there's always a tradeoff between ASIC and white box solutions, there's also an ebb and flow between domain specific languages and general purpose languages. Google replaced Sawzall, a DSL for performing powerful, scalable analysis, with a software ecosystem built around Go. Replacing Sawzall — a case study in domain-specific language migration. The result: we’ve found that with carefully designed libraries we can get most of the benefits of Sawzall in Go while gaining the advantages of a powerful general-purpose language. The overall response of analysts to these changes has been extremely positive. Today, logs analysis is one of the most intensive users of Go at Google, and Go is the most-used language for reading logs through the logs proxy.

  • There's a new data mining Barbie. The new talking Hello Barbie doll has the mind of Siri: "Equipped with Siri-like voice-recognition software and a wi-fi connection, Hello Barbie can respond to questions from kids about everything from her favorite color to career goals." Unfortunately I can't take credit for the data mining comment, I heard it on TWiT

  • If you have 70 data caching stations around the world connected with fast links and you are already expert at caching your own content, starting your own CDN makes a lot sense. So that's what Google did. Cloud CDN. Interestingly, Google may be trying to turn these caching stations into datacenters, so says Google's Secret Plan to Catch Up to Amazon and Microsoft in Cloud. If you could use Kubernetes to place work on the edge and combine that with some kind of multi-datacenter database, you would have yourself very low latency access to a lot of mobile devices.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Free Red Book: Readings in Database Systems, 5th Edition

For the first time in ten years there has been an update to the classic Red Book, Readings in Database Systems, which offers "readers an opinionated take on both classic and cutting-edge research in the field of data management."

Editors Peter Bailis, Joseph M. Hellerstein, and Michael Stonebraker curated the papers and wrote pithy introductions. Unfortunately, links to the papers are not included, but a kindly wizard, Nindalf, gathered all the referenced papers together and put them in one place.

What's in it?

  • Preface 
  • Background introduced by Michael Stonebraker 
  • Traditional RDBMS Systems introduced by Michael Stonebraker 
  • Techniques Everyone Should Know introduced by Peter Bailis 
  • New DBMS Architectures introduced by Michael Stonebraker
  • Large-Scale Dataflow Engines introduced by Peter Bailis 
  • Weak Isolation and Distribution introduced by Peter Bailis 
  • Query Optimization introduced by Joe Hellerstein 
  • Interactive Analytics introduced by Joe Hellerstein 
  • Languages introduced by Joe Hellerstein 
  • Web Data introduced by Peter Bailis 
  • A Biased Take on a Moving Target: Complex Analytics by Michael Stonebraker 
  • A Biased Take on a Moving Target: Data Integration by Michael Stonebraker

Related Articles



Sponsored Post:, Redis Labs,, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Senior Devops Engineer - is looking for a senior devops engineer to help us in making the internet more transparent around downtime. Your mission: help us create a fast, scalable infrastructure that can be deployed to quickly and reliably.

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Your event could be here. How cool is that?

Cool Products and Services

  • Real-time correlation across your logs, metrics and events. just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...


The Serverless Start-up - Down with Servers!

This is a guest post by Marcel Panse and Sander Nagtegaal from

In our early Peecho days, we wrote an article explaining how to build a really scalable architecture for next to nothing, using Amazon Web Services. Auto-scaling, merciless decoupling and even automated bidding on unused server capacity were the tricks we used back then to operate on a shoestring. Now, it is time to take it one step further.

We would like to introduce, also known as the serverless start-up - again, entirely built around AWS, but leveraging only the Amazon API Gateway, Lambda functions, DynamoDb, S3 and Cloudfront.

The Virtues of Constraint

We like rules. At our previous start-up Peecho, product owners had to do fifty push-ups as payment for each user story that they wanted to add to an ongoing sprint. Now, at our current company myTomorrows, our developer dance-offs are legendary: during the daily stand-ups, you are only allowed to speak while dancing - leading to the most efficient meetings ever.

This way of thinking goes all the way into our product development. It may seem counter-intuitive at first, but constraints fuel creativity. For example, all our logo design is done with technical diagramming tool Omnigraffle, so there is no way we could use hideous lens flares and such. Anyway - recently, we launched yet another initiative called So, we needed a new restriction.

At, we are not allowed to use servers. Not even one.

It was a good choice. We will explain why.

Why Servers are Bad

Click to read more ...