Stuff The Internet Says On Scalability For January 8th, 2016

Hey, it's HighScalability time:

Finally, a clear diagram of Amazon's industry impact. (MARK A. GARLICK)


If you like this Stuff then please consider supporting me on Patreon.
  • 150: # of globular clusters in the Milky Way; 800 million: Facebook Messenger users; 180,000: high-res images of the past; 1 exaflops: 1 million trillion floating-point operations per second; 10%: of Google's traffic is now IPv6; 100 milliseconds: time it takes to remember; 35: percent of all US Internet traffic used by Netflix; 125 million: hours of content delivered each day by Netflix's CDN;

  • Quotable Quotes:
    • Erik DeBenedictis: We could build an exascale computer today, but we might need a nuclear reactor to power it
    • wstrange: What I really wish the cloud providers would do is reduce network egress costs. They seem insanely expensive when compared to dedicated servers.
    • rachellaw: What's fascinating is the bot-bandwagon is mirroring the early app market.
      With apps, you downloaded things to do things. With bots, you integrate them into things, so they'll do it for you. 
    • erichocean: The situation we're in today with RAM is pretty much the identical situation with the disks of yore.
    • @bernardgolden: @Netflix will spend 2X what HBO does on programming in 2016? That's an amazing stat. 
    • @saschasegan: Huawei's new LTE modem has 18 LTE bands. Qualcomm's dominance of LTE is really ending this year.
    • Unruly Places: The rise of placelessness, on top of the sense that the whole planet is now minutely known and surveilled, has given this dissatisfaction a radical edge, creating an appetite to find places that are off the map and that are somehow secret, or at least have the power to surprise us.
    • @mjpt777: Queues are everywhere. Recognise them, make them first class, model and monitor them for telemetry.
    • Guido de Croon:  the robot exploits the impending instability of its control system to perceive distances. This could be used to determine when to switch off its propellers during landing, for instance.
    • @gaberivera: In the future, all major policy questions will be settled by Twitter debates between venture capitalists
    • Craig McLuckie: It’s not obvious until you start to actually try to run massive numbers of services that you experience an incredible productivity that containers bring
    • Brian Kirsch: One of the biggest things when you look at the benefits of container-based virtualization is its ability to squeeze more and more things onto a single piece of hardware for cost savings. While that is good for budgets, it is excessively horrible when things go bad.
    • @RichardWarburto: It still surprises me that configuration is most popular user of strong consistency models atm. Is config more important than data
    • @jamesurquhart: Five years ago I predicted CFO would stop complaining about up front cost, and start asking to reduce monthly bill. Seeing that happen now.
    • @martinkl: Communities in a nutshell… • Databases research: “In fsync we trust” • Distributed systems research: “In majority vote we trust”
    • @BoingBoing: Tax havens hold $7.6 trillion; 8% of world's total wealth
    • @DrQz: Amazon's actual profits are still tiny, relying heavily on its AWS cloud business.
    • hadagribble: we need to view fast storage as something other than disk behind a block interface and slow memory, especially with all the different flavours of fast persistent storage that seem to be on the horizon. For the one's that attach to the memory bus, the PMFS-style [1] approach of treating them like a file-system for discoverability and then mmaping to allow them to be accessed as memory is pretty attractive.

  • EC2 with a 5% price reduction on certain things in certain places. Not exactly the race to the bottom one would hope for in a commodity market, which means the cloud is not a commodity. Happy New Year – EC2 Price Reduction (C4, M4, and R3 Instances).

  • Since the locus of the Internet is centering on a command line interface in the form of messaging, chatbot integrations may be giving APIs a second life, assuming they are let inside the walled garden. The next big thing in computing is called 'ChatOps,' and it's already happening inside Slack. The advantage chatops has over the old Web + API mashup dream is that messaging platforms come built-in with a business model/app store, large amd growing user base, and network effects. Facebook’s Secret Chat SDK Lets Developers Build Messenger Bots. Slack apps. WeChat API. Telegram API. Alexa API. Google's Voice Actions. How about Siri or iMessage? Nope. njovin likes it: I've worked with the new Chat SDK and our customers' use cases aren't geared toward forcing (or even encouraging) users into using Facebook Messenger. Most of them are just trying to meet demand from their customers. In our particular case, we have customers with a lot of international travelers who have access to data while abroad but not necessarily SMS. IMO it's a lot better than having a dedicated app you have to download to interact with a specific brand.

  • The world watched a lot of porn this year. If you like analytics you'll love Pornhub’s 2015 Year in Review: In 2015 alone, we streamed 75GB of data a second; bandwidth used is 1,892 petabytes; 4,392,486,580 hours of video were watched; 21.2 billion visits.

  • A very interesting way to frame the issue. On the dangers of a blockchain monoculture: The Bitcoin blockchain: the world’s worst database. Would you use a database with these features? Uses approximately the same amount of electricity as could power an average American household for a day per transaction. Supports 3 transactions / second across a global network with millions of CPUs/purpose-built ASICs. Takes over 10 minutes to “commit” a transaction. Doesn’t acknowledge accepted writes: requires you read your writes, but at any given time you may be on a blockchain fork, meaning your write might not actually make it into the “winning” fork of the blockchain (and no, just making it into the mempool doesn’t count). In other words: “blockchain technology” cannot by definition tell you if a given write is ever accepted/committed except by reading it out of the blockchain itself (and even then). Can only be used as a transaction ledger denominated in a single currency, or to store/timestamp a maximum of 80 bytes per transaction. But it’s decentralized!

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Let's Donate Our Organs and Unused Cloud Cycles to Science

There’s a long history of donating spare compute cycles for worthy causes. Most of those efforts were started in the Desktop Age. Now, in the Cloud Age, how can we donate spare compute capacity? How about through a private spot market?

There are cycles to spare. Public Cloud Usage trends:

  • Instances are underutilized with average utilization rates between 8-9%

  • 24% of instance reservations are unused

Maybe all that CapEx sunk into Reserved Instances can be put to some use? Maybe over provisioned instances could be added to the resource pool as well? That’s a lot of power Captain. How could it be put to good use?

There is a need to crunch data. For science. Here’s a great example as described in This is how you count all the trees on Earth. The idea is simple: from satellite pictures count the number of trees. It’s an embarrassingly parallel problem, perfect for the cloud. NASA had a problem. Their cloud is embarrassingly tiny. 400 hypervisors shared amongst many projects. Analysing all the data would would take 10 months. An unthinkable amount of time in this Real-time Age. So they used the spot market on AWS.

The upshot? The test run cost a measly $80, which means that NASA can process data collected for an entire UTM zone for just $250. The cost for all 11 UTM zones in sub-Sarahan Africa and the use of all four satellites comes in at just $11,000.

“We have turned what was a $200,000 job into a $10,000 job and we went from 100 days to 10 days [to complete],” said Hoot. “That is something scientists can build easily into their budget proposals.”

That last quote, That is something scientists can build easily into their budget proposals, stuck in my craw.

Imagine how much science could get done if you didn’t have the budget proposal process slowing down the future? Especially when we know there are so many free cycles available that are already attached to well supported data processing pipelines. How could those cycles be freed up to serve a higher purpose?

Netflix shows the way with their internal spot market. Netflix has so many cloud resources at their disposal, a pool of 12,000 unused reserved instances at peak times, that they created their own internal spot market to drive better utilization. The whole beautiful setup is described Creating Your Own EC2 Spot Market, Creating Your Own EC2 Spot Market -- Part 2, and in High Quality Video Encoding at Scale.

The win: By leveraging the internal spot market Netflix measured the equivalent of a 210% increase in encoding capacity.

Netflix has a long and glorious history of sharing and open sourcing their tools. It seems likely when they perfect their spot market infrastructure it could be made generally available.

Perhaps the Netflix spot market could be extended so unused resources across the Clouds could advertise themselves for automatic integration into a spot market usable by scientists to crunch data and solve important world problems.

Perhaps donated cycles could even be charitable contributions that could help offset the cost of the resource? My wife is a tax accountant and she says this is actually true, under the right circumstances.

This kind of idea has a long history with me. When AWS first started, I like a lot of people wondered, how can I make money off this gold rush? That’s before we knew Amazon was going to make most of the tools to sell to the miners themselves. The idea of exploiting underutilized resources fascinated me for some reason. That is, after all, what VMs do for physical hardware, exploit the underutilized resources of powerful machines. And it is in some ways the idea behind our modern economy. Yet even today software architectures aren’t such that we reach anything close to full utilization of our hardware resources. What I wanted to do was create a memcached system that allowed developers to sell their unused memory capacity (and later CPU, network, storage) to other developers as cheap dynamic pools of memcached storage. Get your cache dirt cheap and developers could make some money back on underused resources. A very similar idea to the spot market notion. But without homomorphic encryption the security issues were daunting, even assuming Amazon would allow it. With the advent of the Container Age sharing a VM is now way more secure and Amazon shouldn’t have a problem with the idea if it’s for science. I hope.


Sponsored Post: Netflix,, Redis Labs,, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Manager - Site Reliability Engineering: Lead and grow the the front door SRE team in charge of keeping Netflix up and running. You are an expert of operational best practices and can work with stakeholders to positively move the needle on availability. Find details on the position here:

  • Senior Service Reliability Engineer (SRE): Drive improvements to help reduce both time-to-detect and time-to-resolve while concurrently improving availability through service team engagement.  Ability to analyze and triage production issues on a web-scale system a plus. Find details on the position here:

  • Manager - Performance Engineering: Lead the world-class performance team in charge of both optimizing the Netflix cloud stack and developing the performance observability capabilities which 3rd party vendors fail to provide.  Expert on both systems and web-scale application stack performance optimization. Find details on the position here

  • Senior Devops Engineer - is looking for a senior devops engineer to help us in making the internet more transparent around downtime. Your mission: help us create a fast, scalable infrastructure that can be deployed to quickly and reliably.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Your event could be here. How cool is that?

Cool Products and Services

  • Turn chaotic logs and metrics into actionable data. Scalyr is a tool your entire team will love. Get visibility into your production issues without juggling multiple tools and tabs. Loved and used by teams at Codecademy, ReturnPath, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • Real-time correlation across your logs, metrics and events. just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...


Server-Side Architecture. Front-End Servers and Client-Side Random Load Balancing

Chapter by chapter Sergey Ignatchenko is putting together a wonderful book on the Development and Deployment of Massively Multiplayer Games, though it has much broader applicability than games. Here's a recent chapter from his book.

Enter Front-End Servers

[Enter Juliet]
Thou art as sweet as the sum of the sum of Romeo and his horse and his black cat! Speak thy mind!
[Exit Juliet]

— a sample program in Shakespeare Programming Language



Front-End Servers as an Offensive Line


Our Classical Deployment Architecture (especially if you do use FSMs) is not bad, and it will work, but there is still quite a bit of room for improvement for most of the games out there. More specifically, we can add another row of servers in front of the Game Servers, as shown on Fig VI.8:

Click to read more ...


Stuff The Internet Says On Scalability For January 1st, 2016

Hey, Happy New Year, it's HighScalability time:

River system? Vascular system? Nope. It's a map showing how all roads really lead to Rome.


If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • 71: mentions of innovation by the Chinese Communist Party; 60.5%: of all burglaries involve forcible entry; 280,000-squarefoot: Amazon's fulfillment center in India capable of shipping 2 million items; 11 billion: habitable earth like planets in the goldilocks zone in just our galaxy; 800: people working on the iPhone's camera (how about the app store?); 3.3 million: who knew there were so many Hello Kitty fans?; 26 petabytes: size of League of Legends' data warehouse; 

  • Quotable Quotes:
    • George Torwell: Tor is Peace / Prism is Slavery / Internet is Strength
    • @SciencePorn: Mr Claus will eat 150 BILLION calories and visit 5,556 houses per second this Christmas Eve.
    • @SciencePorn: Blue Whale's heart is so big, a small child can swim through the veins.
    • @BenedictEvans: There are close to 4bn people on earth with a phone (depending on your assumptions). Will go to at least 5bn. So these issues will grow.
    • @JoeSondow: "In real life you won't always have a calculator with you." — math teachers in the 80s
    • James Hamilton: This is all possible due to the latencies we see with EC2 Enhanced networking. Within an availability zone, round-trip times are now tens of microseconds, which make it feasible to propose and commit transactions to multiple resilient nodes in less than a millisecond.
    • Benedict Evans: The mobile ecosystem, now, is heading towards perhaps 10x the scale of the PC industry, and mobile is not just a new thing or a big thing, but that new generation, whose scale makes it the new centre of gravity of the tech industry. Almost everything else will orbit around it. 
    • Ruth Williams: Bacteria growing in an unchanging environment continue to adapt indefinitely.
    • @Raju: Not one venture-backed news aggregator has yet shown a Sustainable Business Model
    • @joeerl: + choose accurate names + favor beauty over performance + design minimal essential API's + document the unobvious
    • @shibuyashadows: There is no such thing as a full-node anymore. Now there are two types: Mining Nodes Economic Nodes. Both sets are now semi-centralized on the network, are heavily inter-dependent and represent the majority of the active Bitcoin users.
    • @TheEconomist: In 1972 a man with a degree aged 25-34 earned 22% more than a man without. Today, it's 70%
    • Dr. David Miller~ We are in the age of Howard Hughes. People make their fortune elsewhere and spend it on space. 
    • Credit for CRISPR: Part of that oversimplification is rooted in the fact that most modern life-science researchers aren’t working to uncover broad biological truths. These days the major discoveries lie waiting in the details
    • @BenedictEvans: Idle observation: Facebook will almost certainly book more revenue in 2015 than the entire internet ad industry made up until to 2000
    • Eric Clemmons: Ultimately, the problem is that by choosing React (and inherently JSX), you’ve unwittingly opted into a confusing nest of build tools, boilerplate, linters, & time-sinks to deal with before you ever get to create anything.
    • Kyle Russell: Why do I need such a powerful PC for VR? Immersive VR experiences are 7x more demanding than PC gaming.
    • @josevalim: The system that manages rate limits for Pinterest written in Elixir with a 90% response time of 800 microseconds.
    • catnaroek: The normal distribution is important because it arises naturally when the preconditions of the central limit theorem hold. But you still have to use your brain - you can't unquestioningly assume that any random variable (or sample or whatever) you will stumble upon will be approximately normally distributed.
    • Dominic Chambers: Now, if you consider the server-side immutable state atom to be a materialized view of the historic events received by a server, you can see that we've already got something very close to a Samza style database, but without the event persistence.
    • Joscha Bach: In my view, the 20th century’s most important addition to understanding the world is not positivist science, computer technology, spaceflight, or the foundational theories of physics. It is the notion of computation. Computation, at its core, and as informally described as possible, is very simple: every observation yields a set of discernible differences.

  • The New Yorker is picking up on the Winner Takes All theme that's been developing, I guess that makes it an official meme. What's missing from their analysis is that users are attracted to the eventual winners because they provide a superior customer experience. Magical algorithms are in support of experience. As long as a product doesn't fail at providing that experience there's little reason to switch after being small networked into a choice. You might think many many products could find purchase along the long tail, but in low friction markets that doesn't seem to be the case. Other choices become invisible and what's invisible starves to death.

  • I wonder how long it took to get to the 1 billionth horse ride? Uber Hits One Billionth Ride in 5.5 years.

  • Let's say you are a frog that has been in a warming pot for the last 15 years, what would you have missed? Robert Scoble has put together quite a list. 15 years ago there was no: Facebook, YouTube, Twitter, Google+, Quora, Uber, Lyft, iPhone, iPads, iPod, Android, HDTV, self driving cars, Waze, Google Maps, Spotify. Soundcloud, WordPress, Wechat, Flipkart, AirBnb, Flipboard, LinkedIn, AngelList, Techcrunch, Google Glass, Y Combinator, Techstars, Geekdom, AWS, OpenStack, Azure, Kindle, Tesla, and a lot more.

  • He who controls the algorithm reaps the rewards. Kansas is now the 5th state where lottery prizes may have been fixed.

  • What Is The Power Grid? A stunning 60% of generated energy is lost before it can be consumed, which is why I like my power grids like my databases: distributed and shared nothing.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


How to choose an in-memory NoSQL solution: Performance measuring

The main purpose of this work is to show results of benchmarking some of the leading in-memory NoSQL databases with a tool named YCSB.

We selected three popular in-memory database management systems: Redis (standalone and in-cloud named Azure Redis Cache), Tarantool and CouchBase and one cache system Memcached. Memcached is not a database management system and does not have persistence. But we decided to take it, because it is also widely used as a fast storage system. Our “firing field” was a group of four virtual machines in Microsoft Azure Cloud. Virtual machines are located close to each other, meaning they are in one datacenter. This is necessary to reduce the impact of network overhead in latency measurements. Images of these VMs can be downloaded by links: one, two, three and four (login: nosql, password: qwerty). A pair of VMs named nosql-1 and nosql-2 is useful for benchmarking Tarantool and CouchBase and another pair of VMs named nosql-3 and nosql-4 is good for Redis, Azure Redis Cache and Memcached. Databases and tests are installed and configured on these images.

Our virtual machines were the basic A3 instances with 4 cores, 7 GB RAM and 120 GB disk size.

Databases and their configurations

Click to read more ...


Using AWS Lambda functions to create print ready files

This is a guest post by Thiago Wolff from Peecho.

In a nutshell, Peecho is all about turning your digital content into professionally printed products. Although it might look like a simple task, a lot of stuff happens behind the scenes to make that possible. In this article, we’re going to tell you about our  processing architecture as well as at a recent performance improvement with the integration of AWS Lambda functions.

Print-ready files

In order to make digital content ready for printing facilities, there are some procedures that must occur after the order is received and before the final printing. In printing industry this process is called pre-press and the Peecho platform fully automates its initial stages before routing orders to printers.

Once the file has been created by the customer and uploaded to Peecho, it undergoes our processing stage. During processing, the file is checked to make sure it contains all the elements necessary for a successful print run: do the images have the proper format and resolution, are all the fonts included, are the RGB/CMYK colors set up appropriately, are all layout elements such as margins, crop marks and bleeds set up correctly, etc.

All these checks are automated by our backend systems. The entire process is quite complex and involves heavy computational activities to be executed that are expensive and time consuming. Let’s take a more detailed look at our processing architecture.

Processing Architecture

The processing stage starts right after an order is placed and payment has been confirmed. It’s initiated by the order intake server by adding a message to a SQS processing queue with all information about the order and file to be processed. Whenever there is a message available in the queue, a new processing machine (a large EC2 instance) starts working to transform the original data into a print ready file.

At the core of the processing code we use open source libraries like iText as well as third party software for PDF and image encoding/conversion like PStill and ImageMagick. As the result of processing we generate PDF/X-3 files.

In earlier versions, when the Peecho platform first launched, all processing was executed by EC2 instances. For a single order it was done sequentially; page by page as illustrated below.

Since we can deal with any kind of files and usually really tough ones, the described transformation process could take hours to be executed. In average, it would take 15 seconds per page. Since it needed to be done sequentially, the processing time increased linearly according to the number of pages. For example, a 400-page document would take around 1 hour and 40 minutes to be processed, which  is a considerable amount of time for a single file.

Recently, our development team has integrated the new AWS Lambda functions into the processing architecture and that has changed the story enormously.

AWS Lambda

Imagine if you could simply define a piece of code that runs in a dedicated machine in the cloud, without worrying about provisioning, managing and scaling the servers that you use to run the code? That’s exactly what AWS Lambda is: a compute service where you can define functions that respond to events, such as changes to data in Amazon S3.

In the new processing architecture, we took the existing processing code and converted it into a AWS Lambda function that performs all file transformations on a single page in a document. The new function is written in Node.js and is triggered after S3 file uploads.

After the processing starts, the original document is split into separate pages and uploaded to S3; when the upload completes for every page, a new Lambda instance is launched and starts cracking the page data.

By doing that, we are now able to run a separate processing instance for each page in parallel. It means that for a 400-page document we now launch 400 Lambda instances simultaneously and process the entire document at the same period of time it would take to process a single page. Therefore, the processing time does not increase with the number of pages. And as a result, we can process almost any document in the same time we used to process a single page!

Although AWS Lambda is a great and powerful function, it has some limitations regarding execution time, disk space and memory. For instance, we are not able to use Lambda to process files larger than 500MB. Since we still have to process these big guys, the Peecho platform falls back to the previous mechanism whenever we need to handle corner cases like that.

More on Lambda

Other than document processing, Peecho also uses AWS Lambda functions in some other cool features like the generation of thumbnails for publication covers as well as content previews. For that, Lambda functions are triggered right after a publication is uploaded, so image thumbnails are instantly available in our dashboard, website and checkout pages.

Our development team is obsessed in making things simpler and faster. We are continuously seeking new possibilities for improving performance across Peecho applications. When it comes to that, AWS Lambda function makes a great fit and it’s definitely going to be more and more explored in future releases.

Stuff The Internet Says On Scalability For December 18th, 2015

Hey, it's HighScalability time:

In honor of a certain event what could be better than a double-bladed lightsaber slicing through clouds? (ESA/Hubble & NASA)


If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • 66,000 yottayears: lifetime of an electron; 3 Gbps: potential throughput for backhaul microwave networks; 1.2 trillion: yearly Google searches; $100 trillion: global investible capital; 2.5cm: range of chip powered by radio waves; 

  • Quotable Quotes:
    • @KarenMN: He's making a database / He's sorting it twice / SELECT * from contacts WHERE behavior = 'nice' / SQL Clause is coming to town
    • abrkn: Every program attempts to expand until it has an app store. Those programs which cannot so expand are replaced by ones which can.
    • Amin Vahdat: Some recent external measurements indicate that our [Google] backbone carries the equivalent of 10 percent of all the traffic on the global Internet. The rate at which that volume is growing is faster than for the Internet as a whole.
    • Prismatic:  we also learned content distribution is a tough business and we’ve failed to grow at a rate that justifies continuing to support our Prismatic News products.
    • On General Pershing: Pershing was the way he was because he knew that winning wars was in the details. Troops who paid attention to the small things would master the big things. 
    • jbob2000: Wow! A single developer working on small websites doesn't need MVC? What a revelation! I bet he doesn't have any pesky problems, such as; working in large teams, long term support, developer turn over, documentation, changing requirements, deadlines, scaling, etc. etc. Oh, but the rendered HTML looks nice!
    • Poldrack: That was totally unexpected, but it shows that being caffeinated radically changes the connectivity of your brain
    • @ValaAfshar: Uber is less than 6 years old and now valued more than 80% of S&P 500 companies.
    • @HNTitles: Scaling Pinterest - From 0 to Startup: How We Use That. What startups use to prevent concussions
    • @Carnage4Life: Top 5 qualities of successful teams at Google 1 Failure is OK 2 Dependability 
      3 Clear structure 4 Meaning 5 Impact
    • Ustun Ozgur: The tides have changed there too. Now, you need just two endpoints: One for serving the initial HTML, one for the API endpoints. This is the essence of web programming in the future: Two endpoints to rule them all.
    • @ErlangerNick: US: 1 brewery per 78k people, 10 new breweries per week. UK: 1 brewery per 50k people, 15 new breweries per week when scaling populations.
    • jerf: When rewriting something, you should generally strive for a drop-in replacement that does the same thing, in some cases, even matching bug-for-bug
    • @EricMinick: "We found that where code deployments are most painful, you’ll find the poorest IT performance... and culture" - 2015 Puppet State of DevOps
    • @StartupLJackson: I'm going on the record to say the killer app for Bitcoin is not turning $1 of electricity into $.50 of BTC. 
    • @nntaleb: Paris blokes missed the point that it is not just temp rising, but its volatility rising more than the average! 2nd order effect=fragility
    • Julian Dunn: Unfortunately, I believe that the “large attack surface” is a fundamental design problem with containers being an evolutionary, not a revolutionary step from VMs and bare metal.
    • The Shade Tree Developer: sharing a database is like drug abusers sharing needles.
    • Joe Young: Keurig coffee machines are the bane of my trade. They are not built to last, some rarely make it a year in our business. They have no replaceable parts, so I can not fix them.
    • wh-uws: This is why slack is winning. They took many of the concepts of what makes irc great abd put a much better user experience on top. Why is that so hard for people to understand?
    • @chamath: New VC dynamics: Returns being generated by new firms. Legacy firms increasingly dated and out of touch. 

  • The Talk Show interviewed Apple senior vice president of software engineering Craig Federighi about Swift. The upshot wasn't anything technical, it was a feeling: If you were worried that Apple is going to dangle Swift, get you pot committed, and then pull it out from under you, that seems highly unlikely. It's clear from the interview Apple is using Swift, they are excited about Swift, and it's here to stay. Plan accordingly. John Siracusa is dead on in his discussion of garbage collection. Swift is using ARC instead of garbage collection, which is a bet on determinism winning over virtual machine based language approaches, which is a good bet IMHO, even in the age of more powerful mobile processors.

  • Elon Musk’s Billion-Dollar AI Plan Is About Far More Than Saving the World. Those AIs are so clever. How do you distribute AIs as deep and wide into society as possible? You make it free and open! That's how the AIs are going to take over, riding the open source meme to victory. 

  • It's odd how in software we try to reduce coupling at all costs, yet in biology every opportunity to communicate and create feedback loops is exploited. Maybe it's we who are doing it wrong? Cells send tiny parcels to each other: cells package various molecules into tiny bubble-like parcels called extracellular vesicles to send important messages - in sickness and health.

  • Now that's disaster planning! Elon Musk worries third World War would ruin Mars mission.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


How Does the Use of Docker Effect Latency?

A great question came up on the mechanical-sympathy list that many others probably have as well: 

I keep hearing about [Docker] as if it is the greatest thing since sliced bread, but I've heard anecdotal evidence that low latency apps take a hit. 

Who better to answer than Gil Tene, Vice President of Technology and CTO, Co-Founder, of Azul Systems? Like Stephen Curry draining a deep transition three, Gil can always be counted on for his insight:

And here's Gil's answer:

Putting aside questions of taste and style, and focusing on the effects on latency (the original question), the analysis from a pure mechanical point of view is pretty simple: Docker uses Linux containers as a means of execution, with no OS virtualization layer for CPU and memory, and with optional (even if default is on) virtualization layers for i/o. 

CPU and Memory

From a latency point of view, Docker's (and any other Linux container's) CPU and memory latency characteristics are pretty much indistinguishable from Linux itself. But the same things that apply to latency behavior in Linux apply to Docker.

If you want clean & consistent low latency, you'll have to the same things you need to do on non-dockerized and non-containerized Linux for the same levels of consistency. E.g. if you needed to keep the system as a whole under control (no hungry neighbors), you'll have to do that at the host level for Docker as well.

If you needed to isolate sockets or cores and choose which processes end up where, expect to do the same for your docker containers and/or the threads within them.

If you were numactl'ing or doing any sort of directed numa-driven memory allocation, the same will apply.

And some of the stuff you'll need to do may seem counter-style to how some people want to deploy docker, but if you are really interested in consistent low latency, you'll probably need to break out the toolbox and use the various cgroups, tasksets and other cool stuff to assert control over how things are laid out. But if/when you do, you won't be able to tell the difference (in terms of CPU and memory latency behaviors) between a dockeriz'ed process and one that isn't.


Disk I/O

I/O behavior under various configurations is where most of the latency overhead questions (and answers) usually end up. I don't know enough about disk i/o behaviors and options in docker to talk about it much. I'm pretty sure the answer to anything throughput and latency sensitive for storage will be "bypass the virtualization and volumes stuff, and provide direct device access to disks and mount points".


The networking situation is pretty clear: If you want one of those "land anywhere and NAT/bridge with some auto-generated networking stuff" deployments, you'll probably pay dearly for that behavior in terms of network latency and throughput (compared to bare metal dedicated NICs on normal linux). However, there are options for deploying docker containers (again, may be different from how some people would like to deploy things) that provide either low-overhead or essentially zero-latency-overhead network links for docker. Start with host networking and/or use dedicated IP addresses and NICs, and you'll do much better than the bridged defaults. But you can go to things like Solarflare's NICs (which tend to be common in bare metal low latency environments already), and even do kernel bypass, dedicated spinning-core network stack things that will have a latency behavior no different (on Docker) than if you did the same on bare metal Linux.


Docker (which is "userland as a unit") is not about packing lots of thing into a box. Neither is guest-OS-as-a-unit virtualization. Sure, they can both be used for that (and often are), but the biggest benefit they both give is the ability to ship around a consistent, well captured configuration. And the ability to develop, test, and deploy that exact same configuration. This later turns into being able to easily manage deployment and versioning (including roll backs), and being able to do cool things like elastic sizing, etc. There are configuration tools (puppet/chef/...) that can be used to achieve similar results on bare metal as well, of course (assuming they truly control everything in your image), but the ability to pack up your working stuff as a bunch of bits that can "just be turned on" is a very appealing.

I know people who use virtualization even with a single guest-per-host (e.g. an AWS r3.8xlarge instance type is probably that right now). And people who use docker the same way (single container per host). In both cases, it's about configuration control and how things get deployed, and not at all about packing things in a smaller footprint.

The low latency thing then becomes a "does it hurt?" question. And Docker hurts a lot less than hypervisor or KVM based virtualization does when it comes to low latency, and with the right choices for I/O (dedicated NICs, cores, and devices), it becomes truly invisible.

On HackerNews


Does AMP Counter an Existential Threat to Google?

When AMP (Accelerated Mobile Pages) was first announced it was right inline with Google’s long standing project to make the web faster. Nothing seemingly out of the ordinary.

Then I listened to a great interview on This Week in Google with Richard Gingras, Head of News at Google, that made it clear AMP is more than just another forward looking initiative from Google. Much more.

What is AMP? AMP is two things. AMP is a restricted subset of HTML designed to make the web fast on mobile devices. AMP is also a strategy to counter an existential threat to Google: the mobile web is in trouble and if the mobile web is in trouble then Google is in trouble.

In the interview Richard says (approximately):

The alternative [to a strong vibrant community around AMP] is devastating. We don’t want to see a decline in the viability of the mobile web. We don’t want to see poor experiences on the mobile web propel users into proprietary platforms.

This point, or something very like it, is repeated many times during the interview. With ad blocker usage on the rise there’s a palpable sense of urgency to do something. So Google stepped up and took leadership in creating AMP when no one else was doing anything that aligned with the principles of the free and open web.

The irony for Google is that advertising helped break the web. We have fouled our own nest.

Why now? Web pages are routinely between 2MB and 10 MB in size for only 80K worth of content. The blimpification of web pages comes from two general sources: beautification and advertising. Lots of code and media are used to make the experience of content more compelling. Lots of code and media are used in advertising.

The result: web pages have become very very slow. And a slow web is a dead web, especially in the parts of the world without fast or cheap mobile networks, which is much of the world. For many of these people the Internet consists of their social network, not the World Wide Web, and that’s not a good outcome for lots of people, including Google. So AMP wants to make people fall in love with the web again by speeding it up using a simple, cachable, and open format.

Does AMP work? Pinterest found AMP pages load four times faster and use eight times less data than traditional mobile-optimized pages. So, yes.

Is AMP being adopted? Seems like it.  Some of those on board are: WordPress, Nuzzle, LinkedIn, Twitter. Fox News, The WSJ, The NYT, Huffington Post, BuzzFeed, The Washington Post, BBC, The Economist, FT, Vox Media, LINE, Viber, and Tango, comScore, Chartbeat, Google Analytics,, Network18, and many more. Content publishers clearly see value in the survival of the web. Developers like AMP too. There are over 4500 developers on the AMP GitHub project.

When will AMP start? Google will reportedly send traffic to AMP pages in Google Search starting in late February, 2016.

Will Google advantage AMP in search results? Not directly says Google, but since faster sites rank better, AMP will implicitly rank higher compared to heavier weight content. We may have a two tiered web: the fast AMP based web and the slow bloated traditional web. Non AMP pages can still be made fast of course, but all of human history argues against it.

The AMP talk featured a well balanced panel representing a wide variety of interests. Leo Laporte, famous host and founder of TWiT, represents the small content publisher. He views AMP with a generally positive yet skeptical eye. AMP is open source, but it is still controlled by Google, so is the web still the open web? Jeff Jarvis is a journalism professor and a long time innovative thinker on how journalism can stay alive in the modern era. Jeff helped inspire the idea of AMP and sees AMP as a way publishers can distribute content to users on whatever form of media users are consuming. Kevin Marks is as good a representative for the free and open web as you could ask for. Matt Cutts as a very early employee at Google is of course pro Google, but he’s also represents an engineering perspective. Richard Gingras is the driving force behind AMP at Google. He’s also a compelling evangelist for AMP and the need for a true new Web 2.0.

Here’s a gloss of the discussion. I’m not attributing who said what, just the outstanding points that help reveal AMP’s vision for the future of the open web:

Origin Story

Click to read more ...