Stuff The Internet Says On Scalability For March 20th, 2015

Hey, it's HighScalability time:


What a view! The solar eclipse at sunrise from the International Space Station.

  • 60 billion: rows in DynamoDB; 18.5 billion: BuzzFeed impressions
  • Quotable Quotes:
    • @postwait: Hell is other people’s APIs.
    • @josephruscio: .@Netflix is now 34% of US Internet traffic at night. 2B+ hours of streaming a month. #SREcon15
    • Geo Curnoff: Everything he said makes an insane amount of sense, but it might sound like a heresy to most people, who are more interested in building software cathedrals rather than solving real problems.
    • Mike Acton: Reality is not a hack you're forced to deal with to solve your abstract, theoretical problem. Reality is the actual problem.
    • @allspaw: "The right tool for the job!" said someone whose assumptions, past experience, motivations, and definition of "job" aren't explicit.
    • Sam Cutler: Mechanical ignorance is, in fact, not a strength.
    • @Grady_Booch: Beautiful quote from @timoreilly “rms is sort of like an Old Testament prophet, with lots of ‘Thou shalt not.'" 
    • @simonbrown: "With event-sourcing, messaging is back in the hipster quadrant" @ufried at #OReillySACon
    • @ID_AA_Carmack: I just dumped the C++ server I wrote last year for a new one in Racket. May not scale, but it is winning for development even as a newbie.
    • @mfdii: Containers aren't going to reduce the need to manage the underlying services that containers depend on. Exhibit A: 
    • @bdnoble: "DevOps: The decisions you make now affect the quality of sleep you get later." @caitie at #SREcon15
    • @giltene: By that logic C++ couldn't possibly multiply two integers faster than an add loop on CPUs with no mul instruction, right?
    • @mjpt777: Aeron beats all the native C++ messaging implementations and it is written in Java. 
    • @HypertextRanch: Here's what happens to your Elasticsearch performance when you upgrade the firmware on your SSDs.
    • @neil_conway: Old question: "How is this better than Hadoop?". New question: How is this better than GNU Parallel?"
    • @evgenymorozov: "Wall Street Firm Develops New High-Speed Algorithm Capable Of Performing Over 10,000 Ethical Violations Per Second"

  • And soon the world's largest army will have no soldiers. @shirazdatta: In 2015 Uber, the world's largest taxi company owns no vehicles, Facebook the world's most popular media owner creates no content, Alibaba, the most valuable retailer has no inventory and Airbnb the world's largest accommodation provider owns no real estate.

  • Not doing something is still the #1 performance improver. Coordination Avoidance in Database Systems: after looking at the problem from a fresh perspective, and without breaking any of the invariants required by TPC-C, the authors were able to create a linearly scalable system with 200 servers processing 12.7M tps – about 25x the next-best system.

  • Tesla and the End of the Physical World. Tesla downloading new software to drastically improve battery usage is cool, but devices have been doing this forever. Routers, switches, set tops, phones, virtually every higher end connected device knows how to update itself. Cars aren't any different. Cars are just another connected device. Also, interesting that Tesla is Feature Flagging their new automatic steering capability.

  • The Apple Watch is technology fused with fashion and ecosystem in a way we've never seen before. Which is a fascinating way of routing around slower moving tech cycles. Cycles equal money. Do you need a new phone or tablet every year? Does the technology demand it? Not so much. But fashion will. Fashion is a force that drives cycles to move for no reason at all. And that's what you need to make money. Crazy like a fox.

  • About time. How Google created a Reliable Cron across the Planet. It's more involved than you might think: The solution requires strong consistency guarantees in the distributed environment. The core of the distributed Cron implementation is therefore Paxos, a common algorithm for reaching consensus in an unreliable environment. The use of Paxos and correct analysis of new failure modes of Cron jobs in a large-scale, distributed environment allowed us to build a robust Cron service that is heavily used at Google.

  • Caching and async, the performance strategies for champions. Chrome is introducing two techniques called script streaming and code caching for rapid page loading. Scripts are parsed on a separate thread, increasing CPU usage and improving page load times by 10%. 

  • Something different. Uber Unveils its Realtime Market Platform: Uber wants this new system to handle one million writes a second and a much higher read rate, so it needed to shard its data...To achieve that kind of scale, Uber chose to use Google's S2 Geometry Library. S2 is able to split a sphere into cells...The dispatch system is mostly built with NodeJS...To manage cluster membership and failure detection, ringpop uses SWIM...TChannel supports "backup requests with cross server- cancellation"...Uber uses the driver's phones to distribute the data. 

  • Video from the O'Reilly Software Architecture conference is now available here.

  • Still flying high. Project Loon: Launched from New Zealand, our globe-connecting balloon made the first leg its journey travelling 9000 km over the Pacific Ocean. Approaching our test location in Chile at a speed of 80 km/h, a command was sent for the balloon to rise into a wind pattern that slowed it down to a quarter of its speed, allowing it to drift overhead members of the Loon operations team who were able to connect to the balloon via smartphones on our test-partner mobile network. 

  • Interesting look at Different Approaches for MVCC used in well known Databases [Oracle, PostgreSQL, SQL Server].

  • If Google uses your apps as content farms how do you make money? An Open Google Now Is About to Make Android Super Smart: Google Now, announced at SXSW that the service eventually will open its API to all app developers.

  • The opposite of microservices. Network theory suggests consciousness is global in the brain: Consciousness appears to be an emergent property of how information that needs to be acted upon gets propagated throughout the brain.

  • Is this being developer friendly? Twitter Chokes Off Meerkat’s Access To Its Social Network.

  • Maybe quantum this or that won't make everything better? Quantum search slows unexpectedly on highly connected data structures. Data Structures Influence Speed of Quantum Search in Unexpected Ways: We turned an intuition on its head,” Wong said. “Searching with a quantum particle, we showed the opposite, giving an example where searching in a city with low connectivity yields fast search, and an example where searching in a city with high connectivity yields slow search. Thus the quantum world is much richer than our classical intuitions might lead us to believe."'

  • I can't make heads nor tails of this, but it sure looks cool. @adrianco: Results of a Spigo simulation of 6 region/18 zone @cassandra service. Live D3 version here: http://rawgit.com/adrianco/spigo/master/migration.html?9

  • Follow your own path. Premature Scalability and the Root of All Evil: When you're designing an application, there is a temptation to build it to a super-scalable future-proof architecture, even when the immediate requirements can be met by a simple single-tier application that can exploit the pure power of IIS and SQL Server. Dino recounts the painful story of what happened when the gurus got their way.

  • If networks are your thing here's an excellent Networking @Scale Recap

  • Good to see some new file system action. Distributed filesystems were the original NoSQL distributed databases. Once again, this time with a memory twist? Tachyon: A File and Storage System for the Future of Computing: Tachyon is a memory-centric, fault-tolerant, distributed storage system, which enables reliable data sharing at memory-speed across a datacenter. Tachyon’s memory-centric architecture represents the future of storage. In fact, it’s the first new file system that was built from the ground up to take advantage of memory-centric compute architectures.

  • Choosing a database is always tricky. Here's a thoughtful comparison: Why I Choose PostgreSQL Over MySQL/MariaDB: I have to confess to favoring PostgreSQL. There’s less hassle with licensing, custom data types, table inheritance, a rules systems, and database events.

  • We stand on the shoulders of giants. Interesting bits from “Why Do Computers Stop and What Can Be Done About It?”: Availability is doing the right thing within the specified response time; Hierarchically decompose the system into modules; Design the modules to have MTBF in excess of a year; Make each module fail-fast — either it does the right thing or stops; Detect module faults promptly by having the module signal failure or by requiring it to periodically send an I AM ALIVE message or reset a watchdog timer; Configure extra modules which can pick up the load of failed modules. Takeover time, including the detection of the module failure, should be seconds. This gives an apparent module MTBF measured in millennia; and lots more.

  • Cool attempt to answer the the age old interview question "What happens when you type google.com into your browser and press enter?" Looks fairly complete, but it leaves out what's happening to the person pressing enter. 

  • Nice. The Smartphone as Application Server: The point of the design is that I'm treating the Pi as the display server (i.e. the client, or X11). The smartphone is the app-server because it hosts the relevant state. The reason it's structured this way is that we want the ability to push code, logic, and data to the endpoint and have the endpoint execute it.

  • Quite an interesting approach to search. Forget complex indexes, use brute force. Searching 20 GB/sec: Systems Engineering Before Algorithms: Brute force works if you have a brute problem (and a lot of force)...freeing all 8 cores for the search. The search usually completes in a fraction of a second...In the near future, we’ll be spreading data across servers in such a way that all of our servers can participate in every non-trivial query...Brute-force search, on the other hand, will run at more or less the same speed for any query.

  • 100% agreement. Incuriosity Will Kill Your Infrastructure: Not once have I regretted spending unbounded amounts of time investigating "something fishy."

  • It's difficult to extract a lot of details from this slide deck, but Architecting & Launching the Halo 4 Services - SRE CON 15 has some surprises. Halo 4 went from 0 to 1 million users on day 1, and 4 million users within the first week. It uses: Azure Worker Roles, Azure Table, Azure Blob, Azure Service Bus. Challenges: load patterns, always available, low latency, high concurrency. The architecture uses the Actor model driven by Orleans, which is a runtime and programming model for building distributed systems.

  • Good look at Cache and cache mapping techniques from Algorithms and Me. 

  • Interested in Analytics on the Cheap? The basic workflow: Incoming message passes through our CDN to pick up geolocation headers; Message has its session authenticated (this happens at our routing layer in Nginx/OpenResty); Message is routed to an ingest server; ngest server transforms message and headers into a single character-delimited querystring value; Ingest server makes a HTTP GET to a 0-byte file on S3 with that querystring; The bucket on S3 has S3 logging turned on; We ingest the S3 logs directly into Redshift on a daily basis.

  • Optimizing Go: From 3k req/s/core to 480k req/s/core: copy pointers not memory; keep elements sorted; reduce the size of memory needed to be copied; reduce the number of times copying is needed; prune the map of ultra low values; use sort from standard library; string manipulation produces garbage; cannot use slices as keys in maps, use arrays; use lock-free algorithms where possible to reduce the load on CPU.

  • Adventures in message queues. Antirez talks about Disque, his new messaging system, not based on Redis interestingly enough. But what I really liked is how he broke down the features in detail using a Q&A format.

  • Confluent with a good post on Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform.

  • Why Are Geospatial Databases So Hard To Build?: If your data model is inherently non-scalar, you enter an algorithm wasteland in the computer science literature. Paths, vectors, polygons, and other elementary aggregations of scalar coordinates used in spatial analysis are non-scalar data types. Computational relationships are topological instead of graph-like.

  • Scaling Square Register. This article talks about the process rather than their architecture. It's a good source for ideas if you are looking to create a release process, engineering process, code review process, development process, and QA process. Good stuff. 

  • Interested in how radio networks work? Then you might like Episode 25 of Software Gone Wild on TCP Optimization. It goes deep into what's going on in the network. Very interesting. Mobile networks are scary places that run according to their own rules.

  • gun: A distributed, embedded, graph database engine.

  • At the risk of a little self promotion, here's an iPhone app that I made. Nothing special, but I'm happy with it. Take Me Home Now gets you there and back again. Take a picture of any location, then when you need to go back to that same spot, just tap the picture and you navigate back using Apple's turn by turn navigation.  A must have for the directionally or map app challenged.