« AWS v GCE Face-off and Why Innovation Needs Lower Cost Infrastructures | Main | Paper: Making reliable distributed systems in the presence of software errors »

Stuff The Internet Says On Scalability For April 26, 2013

Hey, it's HighScalability time:


  • 100 Billion -  Neurons in The Human Brain, As Many Cells as Stars in the Milky Way; 10TB - Tumblr memcache
  • Quoteable Quotes:
    • @thoward3: OH: "We make scalability a possibility.. You know, we make 'scalapossibilty'. "
    • Tesla: When wireless is perfectly applied the whole earth will be converted into a huge brain, which in fact it is, all things being particles of a real and rhythmic whole. We shall be able to communicate with one another instantly, irrespective of distance. Not only this, but through television and telephony we shall see and hear one another as perfectly as though we were face to face, despite intervening distances of thousands of miles; and the instruments through which we shall be able to do this will be amazingly simple compared with our present telephone. A man will be able to carry one in his vest pocket.
    • @ADTELLIGENCE: Data on the internet: Data of all of 1993 = Data of 1 second in 2013
    • Nassim Taleb: Man-made complex systems tend to develop cascades and runaway chains of reactions that decrease, even eliminate, predictability and cause outsized events. So the modern world may be increasing in technological knowledge, but, paradoxically, it is making things a lot more unpredictable.
    • The Bw-Tree: A B-tree for New Hardware Platforms: We believe that latch free techniques and state changes that avoid update-in-place are the keys to high performance on modern processors.
    • @rvirding: WhatsApp "Bigger Than Twitter" With Over 200M Monthly Active Users, 8B Inbound And 12B and they use #erlang
    • Jasper Fforde: There’s a lot to be said about merely having a hazy idea of what’s going on but generally reaching the right outcome by following broad policy outlines. In fact, I’ve a sneaky suspicion that it’s the only way of getting things done. Once the horror and unpredictability of unintended consequences gets a hold, even the best-intentioned and noblest of plans generally descend to mayhem, confusion and despair.
    • @enygma: I'm starting to think the Twitter unfollow bug is actually their way to handle scalability
    • @ndubaz: Spent last 2 days training with the Army's latest virtual trainers. More skeptical than ever of scalability and utility for light forces.
    • @bernardgolden: Airbnb workflow control system was 10K (!) lines of bash script.

  • Scaling Deployment at Etsy by Daniel Schauenberg. 1.49 billion page views, 4,215,169 items sold, $94.7 million of goods sold, 22+ million members, 800,000+ active shops. LAMMP + Monolithic App + No Branching + Frequent deployment + lots more.

  • Gotta love his optimism: Nikola Tesla’s Amazing Predictions for the 21st Century. This is not a guy a who made a lot of predictions a got a few obvious/lucky hits. His insights came from a deeply observant and synthetic mind. Just stunning.

  • Oh this looks like fun! Bioengineers Build Open Source Language for Programming Cells: The BIOFAB project is still in the early stages. Endy and the team are creating the most basic of building blocks — the “grammar” for the language. Their latest achievement, recently reported in the journal Science, has been to create a way of controlling and amplifying the signals sent from the genome to the cell. Endy compares this process to an old fashioned telegraph.

  • You know you've entered a more mature stage of development when certification starts: Open Cloud Academy

  • When you have to be on pager duty then Tumblr's Rob Ewaschuk has written a common sense guide to writing clean alerts and keeping a sane oncall rotation: My Philosophy On Alerting - Pages should be urgent, important, actionable, and real, for example.

  • Messaging at Scale at Instagram by Rick Branson. Don't use that very wasteful SQL query, keep a per-account bounded list of media-ids in Redis.

  • jkff on Tokutek being open sourced: See also: cache-oblivious algorithms, of which fractal trees are an example. They are basically algorithms that optimally utilize the different levels of memory caches without knowing their sizes. There's cache-oblivious algorithms for various different things: binary search, sorting etc. - though some of them are so complex that the asymptotic speedup dies by the death of the constant factor. There is also an MIT open video course about these algorithms.

  • Excellent Cloud Tech IV – Cloud Computing Conference Summary by Ramana Lokanathan. Lots of good talks including, Amazon, AirBNB, Facebook, and Percona. 

  • James Urquhart with a superb what and why devops reading list centering around deep thinking on complex systems. Some of the recommendations: The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win; Devops, complexity and anti-fragility in IT: An introduction;  Anti-Fragile: Things that Gain from Disorder; and many more.

  • An animated look at a few search algorithms. 

  • Have to say, this distinction has never made sense to me: Concurrency is not parallelism: concurrency is the composition of independently executing processes, while parallelism is the simultaneous execution of (possibly related) computations. Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.

  • CAS support in Cassandra. It's always interesting to see how even the simplest of ideas require intricate implementations. 

  • Seeking Alpha says Google Vs. Apple: It's About Scalability: Google never stops to consider anything without scalability being the primary requirement. Like Amazon, Google's primary product required scalability just to be viable. Each system Google builds must be built with efficiency of both human and silicon resources. 

  • YAMI4 vs. ZeroMQ. Not vendor neutral, but still a useful comparison. "While ZeroMQ seems to be largely concerned with business-type systems, YAMI4 offers features that are more relevant in real-time control and monitoring messaging systems. The contrasting features are compared in the sections that follow."

  • The world is flat so why shouldn't our UIs be flat too?

  • Wikipedia is adopting MariaDB. They like the optimizer enhancements, Percona’s XtraDB, and add-ons such as the ability to save the buffer pool LRU list. 

  • Reddit's "best" comment scoring algorithm as a multi-armed bandit task: the resulting ranking algorithm is quite straightforward, each new time the comments page is loaded, the score for each comment is sampled from a Beta(1+U,1+D), comments are then ranked by this score in descending order.

  • A good thing to know - Complex Queries in a multidimensional database: The use of multidimensional hash tables allows HyperDex to perform SEARCH operations roughly two orders of magnitude faster than any other database software.

  • Looks intereting - Systems Programming in the Distributed, Multicore World with Go, Rust, and ParaSail by S Tucker Taft: This talk will describe the challenges these languages are trying to address, and the various similar and differing choices that have been made to solve these challenges.

  • Alex Boisvert with an interesting discussion on Efficiency & Scalability: I want to explore the relationship between system efficiency and scalability in distributed systems; they are to some extent two sides of the same coin.  We’ll consider specifically two common system architecture traits:  replication and routing.  Some of this may seem obvious to some of you but it’s always good to back intuition with some additional reasoning.

  • Adaptive Parallelism for Web Search: In this paper, we describe the issues that make the parallelization of an individual query within a server challenging, and we present a parallelization approach that effectively addresses these challenges. Since each server may be processing multiple queries concurrently, we also present an adaptive resource management algorithm that chooses the degree of parallelism at run-time for each query, taking into account system load and parallelization efficiency. As a result, the servers now execute queries with a high degree of parallelism at low loads, gracefully reduce the degree of parallelism with increased load, and choose sequential execution under high load.

Reader Comments (2)

This might be helpful for understanding the difference between concurrency and parallelism:

Or maybe not -- let me know in the comments.

April 29, 2013 | Unregistered CommenterBaron

I think it's something like e=mc2 where there's a relationship between the two things so to define one you also have to define the other.

April 29, 2013 | Registered CommenterTodd Hoff

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>