hot links

Stuff The Internet Says On Scalability For April 10th, 2015

High Scalability

10 Apr 2015 — 8 min read

Hey, it's HighScalability time:

Beautiful, isn't it? It's the cerebral cortex of a rat that is organized like a mini-Internet.

$47 million: value of Cannabis per square km; $3.7 trillion: worldwide IT spending in 2014; $41B: spend on spectrum; 48,000 square km: How Much Land Would it Take to Power the US via Solar; 2,000: Hadoop clusters in the world; 650 pounds: projected size of ET
Quotable Quotes:
- John Hugg: The number one rule of 21st century data management: If a problem can be solved with an instance of MySQL, it’s going to be.
- @sarahnovotny: "there is no compression algorithm for experience" - great quote from Andy Jassy at #AWSSummit
- Steve Martin: I did stand-up comedy for eighteen years. Ten of those years were spent learning, four years were spent refining, and four were spent in wild success.
- Yossi Vardi: Revenues kill the dream.
- @AWSSummits: AdRoll's retargeting and real-time betting operates at 6 billion impressions/day at 100ms latency on #AWS #AWSSummit
- @AWS_Partners: Nike is operating 70+ services as production loads in #aws today #AWSSummit
- @bernardgolden: S3 usage up 102% YOY, ec2 93%: #AWSSummit
- @bernardgolden: AWS growing over 40% yoy. Next earnings announcement s/b v interesting. #awssummit
- @AlexBalk: Here is my Apple Watch review: Your life is largely meaningless. No gadget can obscure its emptiness. You are dying every day.
- Jonas: Google: all apps become search. Facebook: all apps become feeds.
- @jon_moore: most scalable/fast/reliable systems follow these principles: elastic; responsive; resilient; message-driven. #phillyete
- mrmondo: NVMe [Non-Volatile Memory Express] is one of the most important changes to storage over the past decade.
- Peter Thiel: Often the smarter people are more prone to trendy, fashionable thinking because they can pick up on things, they can pick up on cues more easily, and so they’re even more trapped by it than people of average ability
- @nickstenning: The women and men who wrote the nearly bug-free code that controlled a $4Bn space shuttle and the lives of astronauts worked 8am to 5pm.

Have you been let down by miracle materials like carbon nanotubes, buckyballs, and graphene? MOFs (metal–organic frameworks) are here and they are real. This Nature podcast and article tells you all about them (about 13 minutes in). MOFs are scaffolds made of metal containing nodes linked by carbon-based struts. They are pieces that you can plug together and build up into big networks which have spaces in-between. It's those spaces that make MOFs useful. You can trap things in those holes and do things to the molecules when they are trapped. You can store gasses like methane and hydrogen. You can separate mixture of things by varying the pore sizes. Carbon capture is one big use. They also can be used as chemical sensors, maybe in some future version of your watch. Also perhaps write-once-read-many times memory.

Is Amazon recreating the Sun ecosystem in the cloud? We now have the Amazon Elastic File System so everything is remote mounted. WorkSpaces feels like diskless workstations. Storage is over on some NAS. The database is somewhere on the network. And so on. Let's hope NFS lock contention failures and network UI jitter don't also make a comeback. OK, I don't remember having anything like Amazon Machine Learning.

Etsy is giving Facebook's HipHop Virtual Machine (HHVM) for PHP a try. Why? Their API and web code was diverging under parallel development pressures. And they were developing many small API endpoints that used many small requests instead of larger requests that do more work per request. And instead of sharing state in an inherently shared nothing architecture they went with the strategy of just making things faster. This is where HHMV comes in. A great part of the article is also the process they went through to introduce HHVM. It's very measured and rational, each step validating the past and setting up the future. They started with a Minimum Viable Product and were happy that HHVM was surprisingly compatible with their PHP code base. Then they ran a synthetic benchmark. Then they verified the replies were the same as the old code. Then they ran it only for employees. Then they ramped up slowly. Results: We were able to realize a greater throughput on our API cluster, as well as improved performance. Buying fewer servers also means less waste and less power consumption in our data centers.

OK, that's impressive. Migrating from Heroku to AWS (using Docker). It took two engineers about one month. Performance increased 2x and average API response time dropped from around 220ms to under 100ms, and our background task execution times dropped in half as well. Half the number of servers were needed.

I was excited to see AWS is opening up Lambda. It's close to some ideas I've been talking about for a while (Building Super Scalable Systems, What Google App Engine Price Changes Say About The Future Of Web Architecture). When it first came out I rehabed my atrophied node.js skills and gave it a shot. Played around a bit, got some code working, but the problem was Lambda only exposed a few integration points and none of those were anything I cared about. Now, they've made Lambda much more general and in the process much more useful. Worth another look. I also suspect their NFS product was necessary to generalize Lambda. Code could be instantly available on every machine via a mount point. Just like back in the day.

How Early Adopters Are Using Unikernels - With and Without Containers: The creator of MirageOS, Anil Madhavapeddy’s group is working on a new tool stack called Jitsu (Just-in-Time Summoning of Unikernels), which can start a unikernel in ~20ms in response to a network request. < Also, Towards Heroku for Unikernels: Part 2 - Self Scaling Systems.

You want your company to survive? Look to the products you make not how long you've been in business. Company mortality: Researchers find patterns in the life and death of firms: using a statistical technique called survival analysis, Daepp and her mentors discovered something no one had predicted: a firm’s mortality rate — its risk of dying in, say, the next year — had nothing to do with how long it had already been in business or what kinds of products it produced.

What will the world look like when each person is constantly making a 3D model of the world and uploading it so it can be recreated later? It might look something like Smithsonian X3D.

Moore's law is for weenies. United Electrical World Smart Grid: The gross world product increased an estimated 530% in the 19th century and has increased on average by 5.3% annually from 1800 to 1900. The global wealth and grew exponentially over 3600% in the 20th century (even taking into account the damage caused by conflict) to an average of 36% per year. This amazing growth was made possible through automation, efficient use of fossil fuels, periodic replenishments quick recovery from war, and culture of mass consumption.

The answer is not read Twitter. What Does an Idle CPU Do? Surprisingly, it does a lot. There are lots of nice graphics explaining everything in sufficient detail.

Upside Down Databases: Bridging the Operational and Analytic Worlds with Streams: So the trick, at least for me, is how this is all tied together. A synchronous writeable view at the front. A range of different read-only views at the back, running asynchronous to one another. An event stream tying it all together with a single journal of state. Side effect free functions that (re)generate different views from the stream. A spout for programs to listen and interact. All wrapped up in a single data platform. A single joined up unit.

I did not know this. Postgres’s publish-subscribe features made better with JSON: You may not have known this, but Postgres has Publish-Subscribe functionality in the form of NOTIFY, LISTEN, UNLISTEN. This is commonly used for sending notifications that table rows have changed.

Very detailed instructions on Building a high performance SSD SAN - Part 1. The cost is less than $9.5K USD per node and offers 450,000 IOP/s read performance on tier 1 storage and .5GB/s read performance and 1.5GB/s write performance on tier 1 storage.

Lambda Complexity: Why Fast Data Needs New Thinking. Great story by John Hugg of the evolution of requirements for batch processing as embodied by HDFS to low latency response times as embodied by Storm, Lambda, and of course VoltDB as John works for them. Storm has high functionality but that comes with a lot of complexity. The "Speed Layer” and the “Batch Layer” of Lambda are also complex. At the Speed Layer are also Samza, Spark, and Millwheel. For Fast Data John has VoltDB winning on simplicity, integration, and speed. The article is well written, informative, and balanced.

We're' still in search of increased utilization. Utilisation and High Availability analysis: Containers for Microservices: Containerisation (or Docker if you will) is a must if you are considering Microservices. It helps you with increasing utilisation, bringing down cloud costs and above all, improves your availability.

How big is a microservice? How long is a piece of string?

Here's a log processing pipeline for analytics. Scaling out PostgreSQL for CloudFlare Analytics using CitusDB: An Nginx web server running Lua code handles the request and generates a binary log event in Cap’n Proto format; A Go program akin to Heka receives the log event from Nginx over a UNIX socket, batches it with other events, compresses the batch using a fast algorithm like Snappy or LZ4; Another Go program (the Kafka shim) receives the log event stream, decrypts it, decompresses the batches, and produces the events into a Kafka topic with partitions replicated on many servers; Go aggregators (one process per partition) consume the topic-partitions and insert aggregates (not individual events) with 1-minute granularity into the CitusDB database.

I've always thought we have to be able to do better than two states. Development of ternary computers at Moscow State University.

It's not easy to get all this docker and other stuff working. Here's a good how to: Play With Kubernetes Quickly Using Docker. Comments on Hacker News indicate we may have a ways to go before we've reached a product level of stability.

Premature Scalability and the Root of All Evil: These days, scalability is better achieved with a super-optimized and compact single tier web application that is then deployed to some cloud infrastructure. When, and if, it faces high traffic levels, it can then be promptly fine-tuned to cope with the traffic levels it faces.

Connectivity creates a dependency and dependency has a price. Relying on server connections is ruining video games: When used sparingly, it’s no big deal, but developers and publishers seem totally willing to sacrifice the user experience for online hooks. And unsurprisingly, consumers aren’t happy with the situation. So, when are developers going to get the picture, and stop demanding online participation?

Edward Capriolo is going strong, he's on Part 13 of his excellent series: Cleanup compaction - Building a NoSQL store.

Excellent code examples. Designing a Purely Functional Data Structure: Purely functional data structures are (surprisingly) built out of those constraints. They are persistent (FP implies that both old and new versions of an updated object are available) and backed by immutable objects (FP doesn't support destructive updates). Needless to say, it's a challenge to design a purely functional data structure that meets performance requirements of its imperative sibling.

ClickOS: a high-performance, virtualized software middlebox platform. It consists of the Click modular router software running on top of MiniOS (a minimalistic OS available with the Xen sources), plus optimizations to network I/O in order to drive 10 Gb/s throughput for almost all packet sizes. These virtual machines are small (6MB), boot quickly (in about 30 milliseconds) and add little delay (45 microseconds).

White Paper: Shifting Live-To-VOD Media Processing To The Edge: Data shows that when considering a steady audience with consistent consumption of time-shifted linear content, the five-year total cost of ownership (TCO) of JITP infrastructure is nearly twice that of the JITT alternative.

What were different decades in the 1800s like? An interesting historical sidenote: Perkin actually received a patent on his dye (mauveine) when he was only 18 years old, and it made him a very wealthy man as fashion around the country adopted mauve. In a lot of ways he was like a dot-com millionaire, and his success actually led to a huge surge in commercial chemists – as opposed to more academic chemists. People saw the wealth that could be gained by creating a commercially successful chemical product, and jumped on the bandwagon.

Fun interview with Greg Linden on his early days at Amazon. Interview on early Amazon personalization and recommendations. Greg tells a really good story. You may not believe it but the once mighty Amazon was once just a trickle.

Stuff The Internet Says On Scalability For April 10th, 2015

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale