hot links

Stuff The Internet Says On Scalability For October 9th, 2015

High Scalability

09 Oct 2015 — 11 min read

Hey, it's HighScalability time:

Best selfie ever? All vacation photos taken by Apollo astronauts are now online. Fakes, obvi.
If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.

millions: # of Facebook users have no idea they’re using the internet; 8%: total of wealth in tax havens; $7.3B: AWS revenues; 11X: YouTube bigger than Facebook; 10: days 6s would last on diesel; 65: years ago the transistor was patented; 80X: reduction in # of new drugs approved per billion US dollars spent since 1950; 37 trillion: cells in the human body; 83%: accuracy of predicting activities from pictures.

Quotable Quotes:
- @Nick_Craver: Stack Overflow HTTP, last 30 days: Bytes 128,095,601,184,645 Hits 5,795,253,218 Pages 1,921,499,030 SQL 19,229,946,858 Redis 11,752,754,019
- @merv: #reinvent Amazon process for creating new offerings: once decision is made "write the press release and the FAQ you’ll use - then build it."
- @PaulMiller: @monkchips to @ajassy, “One of your biggest competitors is stupidity.” Quite. Or inertia. #reInvent
- @DanHarper7: If SpaceX can publish their pricing for going to space, your little SaaS does NOT need "Contact us for pricing"
- @adrianco: Nice. 2TB of RAM coming per instance next year. From Microservices to Teraservices #reinvent
- @etherealmind: If you haven't implemented 10GbE yet, start thinking about 25GbE instead. Cost per port is roughly 1.4x for 2.5x performance.
- @g2techgroup: Some of the most expensive real estate in the world was being used for data storage...We should not be in the data center business #reinvent
- The microservices cargo cult: the biggest advantage a microservice architecture brings to the table that is hard to get with other approaches is scalability. Every other benefit can be had by a bit of discipline and a good development process.
- findjashua: the new 'best practice' is to have a universal app - that renders on the server on first load, and runs as a js app subsequently. This way crawlers and browsers w js disabled still get raw markup.
- Instagram: Do the simple thing first.
- erikpukinskis: Generic containers are an awkward mid-way point between special-purpose containers (a Wordpress instance or a rails app on heroku) and an actual machine. You get the hassle of maintaining your own instances, without the flexibility or well-defined performance characteristics of an actual box.
- @AWSreInvent: Showing off the Amazon Snowball - a 47lb, 50TB device for transporting data to the AWS cloud #reInvent
- @merv: #reinvent “There is no compression algorithm for experience” - Andy Jassy. Well said.
- Alexander von Zitzewitz: I know that about 90% of software systems are suffering from severe architectural erosion, i.e. there is not a lot of the original architectural structure left in them, and coupling and dependencies are totally out of control.
- Haunted By Data: But information about people retains its power as long as those people are alive, and sometimes as long as their children are alive. No one knows what will become of sites like Twitter in five years or ten. But the data those sites own will retain the power to hurt for decades.

Data is valuable, especially if you can turn it into your own private wire. Scandal Erupts in Unregulated World of Fantasy Sports. How many other data archipelagos are being used as private opaque oracles?

Cool idea, using drones as an exponential technology to spread seeds, countering deforestation with industrial scale reforestation. BioCarbon Engineering. It's precision forestry. A mapping drone system is used to generate high quality 3D maps of an area. Then drones follow a predetermined planting pattern derived from the mapping phase to air fire biodegradable seed pods onto the ground from a height of 1-2 meters. A problem not unlike dropping a mars rover. Clever pod design shields the seeds from impact while giving them the best chance at germination. This approach recapitulates the batch to real-time transformation that we are seeing everywhere. The current version uses a batch approach with distinct pipelined phases. One can imagine the next version using a swarm of communicating drones to coordinate both the mapping and planting in real-time; perhaps even target selection can be automated to form a continuous reactive system.

Birmingham Hippodrome shows how they use Heroku and Facebook's HHVM (HipHop Virtual Machine) to scale their WordPress system and keep it running on a modest budget. Maximum of 4 Standard-1X dynos; Peak requests: ~800/minute; Average memory use per dyno: 130MB; no downtime; Median response time : 5ms; Peak dyno load (so far): ~3.0.

Content publishers have messed up content so badly that to fix it we now have four standards. There's a full desktop HTML version with all the bells and whistles. There's a Google version (Accelerated Mobile Pages (AMP)). There's a Facebook version (Facebook Instant Articles). There's an Apple version (a news format that we don't know about yet). In this new vision content goes to where the users are instead of users going to creators. Platform providers are in control. They own the relationship with your users. They own the data. They own the presentation. Have content creators become serfs? It will be difficult for independent producers to create content for all these platforms. Survival might require partnering with a Great Lord. Great discussion on this replacement web in This Week in Google with Kevin Marks, Leo Laporte, and Jeff Jarvis.

Charles Nim with a good overview of #MESOSCON. Some highlights: Docker is here; there's a battle for the private cloud and Mesos is winning; worry less about avoiding costs, focus on speeding up the development cycle using Continuous Delivery; DevOps provides platform infrastructure and focuses on how systems interact; alerts should go first to the developer that can fix it.

The Snowden Effect: "The European Court of Justice has found that U.S. surveillance breaches the fundamental rights of European citizens." Thanks NSA.

Very unlikely: Volkswagen America's CEO blames software engineers for emissions cheating scandal. Volkswagen pulled off a full stack deception. It couldn't be just a cabal of privates.

Not a lot of details, but here's a different stack to consider: The stack we choose: Erlang, SmartOS, Clojure. Looks interesting.

Epic explanation of Why Intel Added Cache Partitioning: "How is it possible to get 2x to 9x better utilization on the same hardware?...To do better than that across a wide variety of latency-sensitive workloads with tight SLAs, we need some way to schedule low priority work on the same machines, without affecting the latency of the high priority work." Good comments on Hacker News.

Ouch, I remember programming a chess game on the Lisa. Pascal was a lot better than Objective-C. Unknown facts about Apple: Around 2,700 of Apple’s Lisa computers are buried in a landfill in Utah, after the product failed to be successful when it launched in 1983.

Here's a good explanation of how to setup Static websites with TLS/SSL/HTTPS using Google Cloud Platform & StartSSL. It uses Google App Engine and is fairly straightforward.

6 Rules of thumb to build blazing fast web server applications: Avoid premature optimization; Do the minimum amount of work to solve the problem; Defer the work you don't need to do immediately; Use cache when you can; Understand and avoid the N+1 query problem with relational databases; Prepare your app for horizontal scalability when possible.

On Monoliths and Microservices. otto.de with a thoughtful meditation of their microservices in a vertical-style architecture. They use Edge-Side Includes as a way to "integrate a distributed system, so that the customers do not realize the distributed nature of our architecture." They separate verticals and deal with data inconsistencies. They do not try to apply DRY principles across independent services. To prevent dependencies arising they have a rather radical policy of not sharing code between services (unless it's open source).

Brent Ozar on EC2 Dedicated Hosts: This is gonna be huge. You can license one AWS host, and then run as many SQL Server VMs on there as you want. Use the same cool AWS management tools, and dramatically cut your costs.

Awesome look into a complex database repartitioning process fraught with danger. Looks like Airbnb did it right. How We Partitioned Airbnb’s Main Database in Two Weeks: We tend to agree with our friends at Asana and Percona that horizontal sharding is bitter medicine, and so we prefer vertical partitions by application function for spreading load and isolating failures. For instance, we have dedicated databases, each running on its own dedicated RDS instance...we opted to make use of MySQL replication in order to minimize the engineering complexity and investment needed...Should the op have failed, we would have reverted the database host entries in Zookeeper and the message inbox functionality would have been restored almost immediately...End-to-end, this project took about two weeks to complete and incurred just under 7 1/2 minutes of message inbox downtime and reduced the size of our main database by 20%. Most significantly, this project brought us significant database stability gains by reducing the write queries on our main master database by 33%.

Patterns of behaviour. If you are wondering what all those silos of data containing your digital essence can be used for, it's establishing patterns behavior for use in predictive policing. The Minority Report TV show is exploring the idea of algorithmic policing through an all too real sounding Hawk-Eye program. What happens when the algorithm targets you as a suspect for a future-crime? The TV show has both habeas corpus and doctor-patient privilege being suspended; and the firmware in your car is told not to let your drive. If the mass shootings continue is this something people would be willing to accept? Also, Researchers Develop Deep-Learning Method to Predict Daily Activities.

When you create a new programming language you are also creating a new mythology. @Aseas_words: "Mythology and language are united; you cannot express one without another" Dr. Flieger #Midmoot

Resiliency in the brain. Genomes within every individual are not identical. Perhaps code diversity is the right idea? The Surprising Genealogy of Your Brain: Why such a complicated pattern? Imagine if that wasn’t the case, and that a single early cell could give rise to all the neurons in a specific chunk of brain. If that cell developed a mutation in a critical gene, the resulting brain region would be in serious trouble.

Irreversible Failures: Lessons from the DynamoDB Outage. Good summary of the problem. The irreversible failures idea is that: "there is no simple compensating action you can make. Fixing the network didn’t eliminate the overload. Amazon’s engineers were forced to invent a lengthy procedure, on the fly, to steer the system back to a stable state." Random fluctuations will likely reveal the hint of the problem before it happens so monitor and investigate anything out of the norm. Have extra capacity to short circuit cascading failures. Implement load-shedding mechanisms. Make it easy to reduce load on the fly, by making parameters such as timeouts, retry intervals, and cache expiration times adjustable through a simple configuration change.

IFTTT shares an Apple-centric approach for Developing with Docker: IFTTT is currently in the process of moving our infrastructure to a containerized architecture. We have a large collection of microservices, and containers are the next logical step for us in cleanly managing such a complex system. Before moving our production infrastructure over however, we decided that we wanted to start developing with them locally first.

Videos are available from Container Camp LDN 2015.

Aerobatic shows in an interesting use of Lambda to sync static websites directly from Bitbucket. The flow goes: git push to Bitbucket; a git hook is fired; the tar.gz is downloaded from Bitbuket to S3; this triggers Lambda to deploy assets to a S3 Web Host Bucket. Lambda has a hard 60 second timeout which effectively limits the max repo size we can currently support is around 25MB.

Can going lock free improve performance? Yep. Job System 2.0: Lock-Free Work Stealing – Part 3: Going lock-free: Compared to our first implementation, going lock-free and using thread-local allocators gave us almost 7x the performance. Even on top of using thread-local allocators, the lock-free implementation performs between 1.8x and 3.3x times faster.

Whats the relation between sharding and distributed systems? Siddharth Anand gives a great answer: Sharding is another way to say partitioning. Given a data set, partition it into N non-overlapping data sets. You can call them N shards or N partitions...

Martin Thompson on a new Spin Loop Hint proposal by Gil Tene: This is one of the major advantages to C for concurrent algorithms. The reduction in latency by preventing speculative execution is great. What is really nice to see is how much better the server runs on the whole due to enabling the benefits of hyper threading and reduction in power usage so that turbo boost can work better. For me this widens the usage to general high throughput applications, such as real-time stream processing, and not just low-latency finance applications. With each generation of x86 processors we are seeing hyper threading improve and without the use of instructions like PAUSE in concurrent algorithms then we are less likely to enjoy the benefits.

Many find the traditional methods of teaching probability lacking. Understanding probability through code might work better if that's the case for you. Peter Norvig explores Probability, Paradox, and the Reasonable Person Principle: In this notebook, we cover the basics of probability theory, and show how to implement the theory in Python. (You should have a little background in probability and Python.) Then we show how to solve some particularly perplexing paradoxical probability problems.

A sort of cloud native treatise. AWS Well-Architected Framework. Some principles: Stop guessing your capacity needs; Test systems at production scale; Lower the risk of architecture change; Automate to make architectural experimentation easier; Allow for evolutionary architectures; Apply security at all layers; Enable traceability; Automate responses to security events; Focus on securing your system; Automate security best practices; Test recovery procedures; Automatically recover from failure; Scale horizontally to increase aggregate system availability; Stop guessing capacity; Democratize advanced technologies; Go global in minutes; Use server-less architectures; Experiment more often; Transparently attribute expenditure; Use managed services to reduce cost of ownership; Trade capital expense for operating expense; Benefit from economies of scale; Stop spending money on data center operations.

If you are wondering how to make decision based on data then read this: Make Decisions With Data, Not Anecdotes. Learn how to collect the stats, plot graphs, use the Theory of Constraints to guide actions, and make useful dashboards. An excellent look at the process.

Netflix shows how they use Apache Spark at Petabyte Scale. It was not a problem free experience, but they found workarounds.

How do you know what is going on in your distributed system/cloud platform/microservices deployment? By looking dapper of course. From The Morning Paper: Dapper, A Large Scale Distributed Systems Tracing Infrastructure.

SuperChief: From Apache Storm to In-House Distributed Stream Processing: SuperChief is now in production handling all of Librato’s time series aggregations. SuperChief is currently processing around 200K messages/sec and we expect that number to grow significantly as we replace all instances of Storm across our infrastructure.

If you are down for a total nerd fest on the The Martian movie then here you go: SPOILERCAST - Ridley Scott's The Martian - Still Untitled: The Adam Savage Project - 10/6/2015.

Machine Learning: The High Interest Credit Card of Technical Debt: Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible.

GraphChi: Large-Scale Graph Computation on Just a PC: In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer.

keyvi: a key value index, an in-memory FST-based data structure, optimized for size and lookup performance.

Existential Consistency: Measuring and Understanding Consistency at Facebook: our analysis shows that 0.0004% of reads to vertices would return different results in a linearizable system. This in turn gives insight into the benefits of stronger consistency; 0.0004% of reads are potential anomalies

that a linearizable system would prevent. We directly study local consistency models—i.e., those we can analyze using requests to a sample of objects—and use the relationships between models to infer bounds on the others.

Baker Street: "a service discovery and routing system designed for microservice architectures." Here's an excellent article with thorough explanation: Baker Street: Avoiding Bottlenecks with a Client-Side Load Balancer for Microservices.

Stuff The Internet Says On Scalability For October 9th, 2015

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale