Stuff The Internet Says On Scalability For January 16th, 2015

Hey, it's HighScalability time:


First people to free-climb the Dawn Wall of El Capitan using nothing but stone knives and bearskins (pics).

  • $3.3 trillion: mobile revenue in 2014; ~10%: the difference between a good SpaceX landing and a crash; 6: hours for which quantum memory was held stable 
  • Quotable Quotes:
    • @stevesi: "'If you had bought the computing power found inside an iPhone 5S in 1991, it would have cost you $3.56 million.'"
    • @imgurAPI: Where do you buy shares in data structures? The Stack Exchange
    • @postwait@xaprb agreed. @circonus does per-second monitoring, but *retain* one minute for 7 years; that plus histograms provides magical insight.
    • @iamaaronheld: A single @awscloud datacenter consumes enough electricity to send 24 DeLoreans back in time
    • @rstraub46: "We are becoming aware that the major questions regarding technology are not technical but human questions" - Peter Drucker, 1967
    • @Noahpinion: Behavioral economics IS the economics of information. via @CFCamerer 
    • @sheeshee: "decentralize all the things" (guess what everybody did in the early 90ies & why we happily flocked to "services". ;)
    • New Clues: The Internet is no-thing at all. At its base the Internet is a set of agreements, which the geeky among us (long may their names be hallowed) call "protocols," but which we might, in the temper of the day, call "commandments."

  • Can't agree with this. We Suck at HTTP. HTTP is just a transport. It should only deliver transport related error codes. Application errors belong in application messages, not spread all over the stack. 

  • Apple has lost the functional high ground. It's funny how microservices are hot and one of its wins is the independent evolution of services. Apple's software releases now make everything tied together. It's a strategy tax. The watch just extends the rigidity of the structure. But this is a huge upgrade. Apple is moving to a cloud multi-device sync model, which is a complete revolution. It will take a while for all this to shake out. 

  • This is so cool, I've never heard of Cornelis Drebbel (1620s) before or about his amazing accomplishments. The Vulgar Mechanic and His Magical Oven: His oven is one of the earliest devices that gave human control away to a machine and thus can be seen as a forerunner of the smart machine, the self-deciding automaton, the thinking robot.

  • Do you think there's a DevOps identity crisis, as Baron Schwartz suggests? Does DevOps have a messaging and positioning problem? Is DevOps just old wine in a new skin? Is DevOps made up of echo chambers? I don't know, but an interesting analysis by Baron.

  • How does Hyper-threading double your CPU throughput?: So if you are optimizing for higher throughput – that may be fine. But if you are optimizing for response time, then you may consider running with HT turned off.

  • Underdog.io share's what's Inside Datadog’s Tech Stack: python, javascript and go; the front-end happen in D3 and React; databases are Kafka, redis, Cassandra, S3, ElasticSearch, PostgreSQL; DevOps is Chef, Capistrano, Jenkins, Hubot, and others.

  • Is software ultimately built on faith? Is that faith misplaced? The Future of Computing: Logic or BiologyWhen people who can’t think logically design large systems, those systems become incomprehensible. And we start thinking of them as biological  systems. And since biological systems are too complex to understand, it  seems perfectly natural that computer programs should be too complex to understand. We should not accept this. That means all of us, computer professionals as well as those of us who just use computers. If we don’t, then the future of computing will belong to biology, not logic. We will continue having to use computer programs that we don’t understand, and trying to coax them to do what we want. Instead of a sensible world of computing, we will live in a world of homeopathy and faith healing.

  • If your mental model of a processor is stuck in the 80s then here's a great update: What's New in CPUs Since the 80s and How Does It Affect Programmers?: x86 chips have picked up a lot of new features and whiz-bang gadgets. For the most part, you don’t have to know what they are to take advantage of them. As a first-order approximation, making your code predictable and keeping memory locality in mind works pretty well. The really low-level stuff is usually hidden by libraries or drivers, and compilers will try to take care of the rest of it. The exceptions are if you’re writing really low-level code, in which case the world has gotten a lot messier.

  • HTTP2 is coming. What does it mean? Architecting Websites For The HTTP/2 Era: HTTP/2 introduces multiplexing, which allows one TCP/IP connection to request and receive multiple resources, intertwined; HTTP/2 actively discourages the use of compression for secure websites; In HTTP/2, the server can send along extra resources together with the first HTTP request, thus avoiding additional network round-trips for follow-up HTTP requests; Each request can be given a priority.

  • Sometimes more is better, sometimes smarter is better. I (Don't) like Big Buffers: My point is, overall, to cure the incast problem wholly and completely, we need dynamic path diversity along with data-driven workload placement to fully optimize the distributed compute platforms that we’ll be dealing with in the future.

  • I would add: if a rewrite worked then it was a great idea, otherwise, how could you have done such a stupid thing? Lessons learned from the big rewrite: 1. The rewrite takes forever; 2. Migrating the database is hard; 3. Test the new site for performance; 4. Tell your users about the relaunch. 

  • If you are in the bay area you might like a new Bay Area Stream Processing Meetup.

  • Good stuff. Loving a Log-Oriented ArchitectureA software application’s database is better thought of as a series of time-ordered immutable facts collected since that system was born, instead of as a current snapshot of all data records as of right now.

  • How to make a connector for HTTP/2.0 connector? Two good summaries...politician: Enqueue work items into per-connection queues and process sequentially; use a pool of work-stealing threads to harvest work from the per-connection queues to optimize the case where slow work items block. derefr: In other words, treat HTTP like the de-facto OSI-layer-5 protocol it is, with packets being decoded into a ring buffer and various drivers at higher layers then poll()ing or select()ing them into their processes.

  • Spotify has 60 million active users and over 30 million songs. Here's more on Personalization at Spotify.  Kafka for log collection. Storm for real-time event processing. Crunch for running batch map-reduce jobs on Hadoop. Cassandra to store user profile attributes and metadata about entities like playlists, artists, etc.

  • Lock-free Sequence LocksLock-free algorithms aren’t always (or even regularly) more efficient than well thought out locking schemes; however, they are more robust and easier to reason about. When throughput is more than adequate, it makes sense to eliminate locks, not to improve the best or even the average case, but rather to eliminate a class of worst cases – including deadlocks.

  • Ebay reduces storage costs using tiered storage. HDFS Storage Efficiency using Tiered Storage. Data is divided into HOT, WARM, COLD, FROZEN, ARCHIVE depending on usage frequency. The win: Storage without computing is cheaper than storage with computing. We can use the temperature of the data to make sure that storage with computing is wisely used. Because each block of data is replicated a few times (the default is three), some replicas can be moved to the low-cost storage based on the temperature of the data.

  • How does the Enigma machine work? Steve Gibson explains all. Leo Laporte and Steve also talk about some of the liberties taken with the Turing movie. Movies are at best only a game of imitation.

  • Microsoft Azure Launches Monster Cloud Instances: The new G-series instances go up to 32 cores, 448 GiB of RAM, and 6,596 GB of local SSD storage. (GB is 10003 and GiB is 10243)

  • Unikernels: Library Operating Systems for the Cloud: A unikernel is a library OS. By building on top of the hypervisor, the usual pain point of hardware compatibility can be delegated to it. The Mirage team have chosen to ‘eschew backwards compatibility’ (you won’t find any POSIX here) which liberates them to consider new points in the design space.

  • Deep Image: Scaling up Image Recognition: We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies for data partitioning and communication, larger deep neural network models, novel data augmentation approaches, and usage of multi-scale high-resolution images.

  • Life beyond Distributed Transactions: an Apostate’s OpinionIn general, application developers simply do not implement large scalable applications assuming distributed transactions. When they attempt to  use distributed transactions, the projects founder because the performance costs and fragility make  them impractical. Natural selection kicks in…

  • The Case for Offload ShapingWhen offloading computation from a mobile device, we show that it can pay to perform additional on-device work in order to reduce the offloading workload. We call this offload shaping, and demonstrate its application at many di fferent levels of abstraction using a variety of techniques. We show that offload shaping can produce signi cant reduction in resource demand, with little loss of application-level fi delity.