Stuff The Internet Says On Scalability For November 7th, 2014

Hey, it's HighScalability time:


Google's new hyper efficient, hyper secure datacenters. (keynote, large, m, o, r e)

  • 2 billion: containers started every week at Google; 86%: Apple's share of handset profits; 50: number of parallel processes in the brain; $12 million: cost for every additional second it takes customers to pay at a Walmart store; 800 terabytes: rendering for the movie Interstellar
  • Quotable Quotes:
    • @IFTTTRecipe: http://Highscalability.com  to email @ http://ift.tt/1omSSgs Send new highscalability.com articles to my mailbox
    • @chadfowler: I've decided to break up tomorrow's monolithic talk about microservices into a series of microtalks #gotober
    • Emily Balcetis: Attention can “be thought of as what you allow your eyes to look at.”
    • Adam Alter: researchers have shown that our names take root deep within our mental worlds, drawing us magnetically toward the concepts they embody
    • @mjpt777: Monitoring needs to reflect an issue within 10 seconds of it happening. Poll at 1 second granularity. - @adrianco #gotober
    • @adyranov: "Trust only benchmarks you've made yourself. All others are benchmarketing" #highload2014
    • @adrianco: #GOTOber @mjpt777 Aeron performance on a laptop - 20 million 40byte messages/sec. 90%ile latency of 7us. 100% 37us.
    • @mjpt777: Have your infrastructure optimised for speed of delivery rather than cost, stuff then gets done so fast you reduce cost. @adrianco #gotober
    • Seth Newburg: This is a first in mobile! This is a device running over an internal network, rather than just everything being connected to a CPU
    • @swardley: OpenStack vs AWS - IMHO OpenStack should have dominated but pisspoor strategic play makes it unlikely for 10-15yrs.
    • Rudiger Moller: There is another strong value in going off heap (even when using advanced collectors like Zing's): Datastructures expressed in Java frequently have a redundancy of 80-90%.
    • @zooko: Novice engineers have not yet grokked this: the number of modes or options in your system is the *exponent* in how hard it is to maintain.

  • OK Google...give me more cool functionality and lower prices. And they did. Google Cloud Platform Live: Introducing Container Engine, Cloud Networking and much more. This may take a while to process and put in some context. What Google is continuing to do is consumerize their internal services. The biggest example of this is opening up their container scheduling, management, and auto scaler services, while making it Docker compatible. If you want to run code as close as possible to your users you can now direct peer to any of Google's over 70 points of presence in 33 countries around the world. That's impressive. AWS has 14 and Azure has 11. Firebase is still behind Parse in features, but with Google's support look for it to blossom. The distributed debugger is truly impressive work. Game changing really. There's also a new improved Pub/Sub system. Slowly but surely the worm turns. 

  • A great list. I'd like to see what a similar list for the message dispatch and handlers would look like. Martin Thompson design principles for the Aeron messaging system: garbage free in steady state running; smart batching in the message path; wait-free algos in the message path; non-blocking IO in the message path; no exceptional cases in message path; apply the single writer principle; prefer unshared state; avoid unnecessary data copies. 

  • Nicely done André Staltz. The introduction to Reactive Programming you've been missing. If insight meditation has not been enough to guide your conversion from the object obsessed to the functionally enlightened then this article might be just what you are looking for. Or not.

  • This is cool, using Cassandra as an embedded database. Russian Facebook Ok.ru Utilizing Cassandra for 1 Million Operations per Second, Over 80TB in Largest Cluster: The way we use Casssandra is somewhat unusual – we don’t use thrift or netty based native protocol to communicate with Cassandra nodes remotely. Instead, we co-locate Cassandra nodes in the same JVM with business service logic, exposing not generic data manipulation, but business level interface remotely. This way we avoid extra network roundtrips within a single business transaction and use internal calls to Cassandra classes to get information faster. Also this helps us to do many little hacks on its internals, making huge gains on efficiency and ease of distributed servers development.

  • I have no idea why I think of pancakes when I hear this name. Flafka: Apache Flume Meets Apache Kafka for Event Processing: The new integration between Flume and Kafka offers sub-second-latency event processing without the need for dedicated infrastructure.

  • How Buffer built their new data architecture. Redshift, Looker, and Jenkins were the weapons of choice. 

  • A wonderfully detailed article on a problem many people face: wanting to move to multiple datacenters, wanting to shard, all while handling distributed failures. Introducing Dynomite - Making Non-Distributed Databases, Distributed: Design goal is to turn those single-server datastore solutions into peer-to-peer, linearly scalable, clustered systems while still preserving the native client/server protocols of the datastores, e.g., Redis protocol. < One weakness is it doesn't handle consistency problems well, but using the native clients while preserving your current data store  is genius. 

  • Here's a hot database tuning tip. Disable transparent huge pages. Redis latency spikes and the Linux kernel: a few more details

  • Not everything is at it seems when money is involved. New Study From M-Lab Sheds Light On Widespread Harm Caused By Netflix Routing Decisions...The data shows that the Comcast/Netflix connection agreement increased performance for other ISPs at the same time. Between this and the MIT/CAIDA findings, it's now clear that Netflix has more control over their quality than they suggest and is using calls for greater net neutrality to simply drive down the prices they pay.

  • Is it time to replace % with AND operator to the same power of 2 - 1 everywhere in your code which has this opportunity? The Mythical Modulo Mask investigates this optimization strategy. Honestly, I'm not sure what the answer is, but the discussion is elavated.

  • Wouldn't it be nice to be able to create a C++ web application? facebook/proxygen: A collection of C++ HTTP libraries including an easy to use HTTP server.

  • Videos from RICON2014 are available. You might like Consul: Service Oriented at ScaleMesos: The Operating System for your ClusterMulticore and Distributed Systems: Sharing Ideas to Build A Scalable Database.

  • bennyg on Let's Talk About Beacons:  I really, really want them to be good. But they die too quickly, and don't provide granular signal data. Hold the phone between you and the beacon and get a decent signal, then turn 180 degrees so that you are between the phone and the beacon. Apparently you've moved 45ft.

  • Ah, the future is so bright I have to wear shades. The Amazons of the dark net: Moreover, the deep web’s denizens will continue to adapt. Jamie Bartlett, author of “The Dark Net”, predicts: “The future of these markets is not centralised sites like Silk Road 2.0, but sites where…listings, messaging, payment and feedback are all separated, controlled by no central party”—and thus impossible to close.

  • A good way to look at a Comparison : Aerospike vs Cassandra

  • Cache Eviction: When Are Randomized Algorithms Better Than LRU?: So we’ve seen that this works, but why would anyone think to do this in the first place? The Power of Two Random Choices: A Survey of Techniques and Results by Mitzenmacher, Richa, and Sitaraman has a great explanation. The mathematical intuition is that if we (randomly) throw n balls into n bins, the maximum number of balls in any bin is O(log n / log log n) with high probability, which is pretty much just O(log n). But if (instead of choosing randomly) we choose the least loaded of k random bins, the maximum is O(log log n / log k) with high probability, i.e., even with two random choices, it’s basically O(log log n) and each additional choice only reduces the load by a constant factor.

  • How L1 and L2 CPU caches work, and why they’re an essential part of modern chips: So why add continually larger caches in the first place? Because each additional memory pool pushes back the need to access main memory and can improve performance in specific cases.

  • One Billion Documents, Testing the Limits of Nuxeo: We decided the best way push the Nuxeo architecture to reach 1 Billions documents is to plug the same Nuxeo platform instance into several repositories. Having several repositories on the same application is a simple way to shard the Nuxeo data across several database engine. To make that happen efficiently we duplicated the PostgreSQL database and the Elasticsearch shards resulting in a single Nuxeo instance with 10 PostgreSQL databases (each of them storing 100 Millions documents), 10 Binary stores (each of them storing 100 Millions blobs), and 1 unique Elasticsearch unified index (using several shards)

  • How Gravity Explains Why Time Never Runs Backward: Instead of using entropy, the researchers describe their system with a quantity they call complexity, which they define as roughly the ratio of the distance between the two particles farthest from each other to the distance between the two particles closest to each other. When the particles are clumped together, complexity is at its lowest.

  • A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services: Under high load, the large- scale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution— or, while maintaining equivalent throughput, reduces the tail latency by 29%.

  • Geek with another grand set of Greg's Quick Links