« The Joy of Deploying Apache Storm on Docker Swarm | Main | How Twitter Handles 3,000 Images Per Second »

Stuff The Internet Says On Scalability For April 22nd, 2016

Hey, it's HighScalability time:

A perfect 10. Really stuck that landing. Nadia Comaneci approves.


If you like this sort of Stuff then please consider offering your support on Patreon.
  • $1B: Supercell’s Clash Royale projected annual haul; 3x: Messenger and WhatsApp send more messages than SMS; 20%: of big companies pay zero corporate taxes; Tens of TB's RAM: Netflix's Container Runtime; 1 Million: People use Facebook over Tor; $10.0 billion: Microsoft raining money in the cloud; 

  • Quotable Quotes:
    • @nehanarkhede: @LinkedIn's use of @apachekafka:1.4 trillion msg/day, 1400 brokers. Powers database replication, change capture etc
    • @kenkeiter~ Full-duplex on a *single antenna* -- this is huge.  (single chip, too -- that's the other huge part, obviously) 
    • John Langford: In the next few years, I expect machine learning to solve no important world issues.
    • Dan Rayburn: By My Estimate, Apple’s Internal CDN Now Delivers 75% Of Their Own Content
    • @BenedictEvans: If Google sees the device as dumb glass, Apple sees the cloud as dumb pipes & dumb storage. Both views could lead to weakness
    • @JordanRinke: We need less hackathons, more apprenticeships. Less bootcamps, more classes. Less rockstars, more mentors. Develop people instead of product
    • @alicegoldfuss: Nagios screaming / Data center ablaze? No / Cable was unplugged
    • Mark Bates: As I was working on the software part time, I was keen to minimise the [cognitive] scope required when making changes. A RoR monolith was the best choice in this case.
    • Google: Our tests have shown that AMP documents load an average of four times faster and use 10 times less data than the equivalent non-amp’ed result.
    • @stevesi: In earning's call @sundarpichai says going “from mobile-first to AI-first world" emphasizing AI and machine learning across services.
    • Rex Sorgatz: Unfortunately, the entire thesis of my story is that having the history of recorded music in your pocket dictates that you will develop tastes outside “the usual.”
    • Newzoo: Clash Royale has rocketed to such quick success because of its strong core gameplay elements combined with some serious pressure to spend real money to keep up with your friends
    • vgt:  I'm going to plug Google Cloud's Preemptible VMs as a simpler alternative to Spot Instances: - Preemptible VMs are sold at a fixed 70% off discount, removing pricing volatility entirely
    • @mfdii~ "Cloud Native" is code words for "rewrite the entire f*cking app"
    • Danny Hillis: Unlike the Enlightenment, where progress was analytic and came from taking things apart, progress in the Age of Entanglement is synthetic and comes from putting things together.
    • Reid J. Robison: What this means is that we’d all better brace ourselves for a major flood of genomic data. The 1000 genomes project data, for example, is now available in the AWS cloud and consists of >200 terabytes for the 1700 participants. 
    • @rkoutnik: I bought my boss two copies of The Mythical Man Month so he could read it twice as fast
    • @JackMarshall: The average Web page size is now the same size as Doom. 
    • @stilkov: The correct label for a system with an unreliable, distributed, cascaded, RPC-style network call-stack is “badly designed”, not “RESTful”.
    • Scales dropped: Mr Mensch reckons that his clients will do “fine” from Spotify. But none of them will earn two dollars a record, as in the days when music could be sold by the pallet.
    • Jeff Foust: Milner’s Breakthrough Starshot project will work on technology that would allow chip-sized spacecraft to be accelerated by laser light sails and travel at up to 20 percent the speed of light.
    • @Spacekatgal: Reprimanding abusive LoL players in under 10 minutes cut abuse by 50 percent. Providing evidence made it 70 percent
    • @conniechan: 6/ THIS is what a WeChat transaction actually looks like (ordering McDonalds); not a chat but an app-within-an-app.
    • Lab126: We spent so much time trying to anticipate what Jeff would do or say, and read into little words he would say in meetings. ... It would lead to so much additional work.
    • Priyamvada Natarajan: All that we know of the universe we get from observing photons
    • @tommorris: How to get serverless architecture: write your application to rely heavily on Amazon's servers. Which aren't servers because we say so.
    • @michellebrush: "developers are bureaucrats by nature. We have a tendency to solve problems by creating standard processes" from The Art of Business Value
    • Chris Messina: The biggest point of contention on this podcast — which is the right one from a product design perspective — is whether the conversation is necessary to commerce, or whether embedded webviews are sufficient. I don't care about apps and bots. I care about behavior and user acceptance.
    • @eetimes: WiFi Finally Ready for 60 GHz - The 60 GHz version of WiFi is finally starting to gain traction
    • @BrianRoemmele: As predicated ~37% of transactions at #Coachella is Apple Pay. •Food = Highest use w/ higher avg tkt vs 💵 •Line speed = 87% faster!

  • Imperfection as a strategy. Why a Chip That’s Bad at Math Can Help Computers Tackle Harder Problems: In a simulated test using software that tracks objects such as cars in video, Singular’s approach [computer chips are hardwired to be incapable of performing mathematical calculations correctly] was  capable of processing frames almost 100 times faster than a conventional processor restricted to doing correct math—while using less than 2 percent as much power.

  • You have to fight magic with magic, super-villains with super-heroes, and algorithms with algorithms. How I Investigated Uber Surge Pricing in D.C. Also, Investigating the algorithms that govern our lives.

  • Mitchell Hashimoto in The Cloudcast #246  on some cloud trends. Seeing a lot of interest in non-Amazon clouds right now. A lot of interest in Azure is coming from more boring successful companies, not hot Silicon Valley startups.  This is not a clean market segmentation, but there are three flavors of cloud: Google Compute for the green field startup crowd, Amazon for enterprise, and Azure for super-enterprise. One enterprise attractor for Azure is Azure Stack, an on-premises solution. Mitchell is seeing a broad adoption of the cloud across industries you may not expect to be using the cloud. Also seeing a transition to a multi-cloud strategy to create pricing leverage. The idea seems to be to rehearse and plan to move to another cloud, though they may not actually do it, but when pricing negotiations come up there's a lot of leverage saying you can move to a completely different platform. The cloud is not so much a pay as you go model for this use case, it's more about trying to lock-in long term cost savings. International companies are interested in price, but also what features are available in different regions and when they become available.

  • So, Facebook is worried about users sharing less. Since Facebook has turned its feed into a Hunger Game, why should people post? Facebook has decentivized posting by making it improbable that the people you want to see your posts will actually ever see them. Obvi. 

  • #GIFEE (Google's Infrastructure For Everyone). It's the idea that in the future software will run after the Google fashion: containers, datacenter level schedulers, services, etc. If you are not Google how do you make this happen? That's where CoreOS wants to help. There's a good interview with Alex Polvi, CEO at CoreOS, on Datanauts 032  that talks all about it. Also, Introducing open source DC/OS: The best way to run containers

  • Amazon EBS SC1 Initial Impressions. Localytics tests out SC1, one of Amazon's new New Cold Storage options, designed for less frequently accessed high-throughput MapReduce, Kafka, ETL, log processing, and data warehouse workloads;  $0.025 / gigabyte / month. They really liked it: there is about a 500ms penalty added to this set of queries, but the top end of query time ends up in the same bands as standard volume types, which is well within our performance expectations...running this configuration is expected to shave 10% off our platform operating costs, which is a huge win.

  • Cliff Click with an informed takedown on the never ending war of Java vs. C Performance. Where C/C++ beats Java: Very Small Footprint, Very deterministic or fast re(boot) times, Very big problems, Value types,  Direct Machine Access, Direct Code Generation, Destructors vs finalizers and try/finally. Where Java Beats C/C++: Most Programs - profiling pays off, Very large programs > 1MLOC, GC is easier to get right than malloc/free, GC is efficient, GC allows concurrent algorithms, Single CPU speed stalled, Better multi-threading support, Tools for parallel coding and debugging, Vast library collection.

  • Good takeaways from the O’Reilly Software Architecture Conference (day one and two) by Daniel Bryant. As someone who spends way too much money at Home Depot it's interesting to learn they are moving to a cloud & microservices approach. Keep this quiet, but Meta42 Labs started with a Go-based microservices and an Angular.js UI and ended up building a Ruby on Rails monolith. 

  • The only treadmill you don't want to get off. The Hedonic Treadmill.

  • Smart storage for big data: The key is using machine learning to determine data value...As the system watches human interaction with the data set, it learns what is important and tiers, protects and stores data according to user needs...So enabling machine intelligence to trash data is probably the most essential issue and value of cognitive storage.

  • Useful list of tips and heroku commands to figure out what's going on. 1 to 1000 - Scaling a Rails App on Heroku: One of the first things you can do to help increase read performance is add better indices...the “Metrics” tab can help you quickly identify slow queries or queries...Another issue that to watch for is a low cache hit rate...To learn more about your database’s cache hit rate, try the pg:diagnose...we began to explore using 1 or 2 of the new PX dynos rather than a large number of smaller dynos. We instantly saw performance improvements...we set up to automatically restart dynos based on the swap memory output in our logs...we currently use an addon called Adept Scale. Adept Scale gives you the ability to scale dynos up/down automatically...we did run into some issues with some long running background jobs being killed.

  • 32c3 YBTI videos are now available online. There are talks on on Axolotl,  I2P, Taler, SecuShare, net2o, and GNUnet's Byzantine Set Consensus.

  • How is it we can recognize patterns in time and space? Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex: We conclude that pyramidal neurons with thousands of synapses, active dendrites, and multiple integration zones create a robust and powerful sequence memory.

  • YCSB testing reveals Scylla latency and performance advantages: A 3 node Scylla cluster executes 4.6X more OPS than a similar Cassandra cluster. Only a 30 node Cassandra cluster can level the throughput of the Scylla cluster of 1/10th the size. Yet the 1:10 gain is not the end of it. Latency measurement reveals that Scylla has 4X-10X better P99-latency advantage.

  • This explains why there are so many dysfunctional software teams. It's much harder to make people care and support each other than it is to twist other more objective knobs. What Google Learned From Its Quest to Build the Perfect Team. What matters is not who is on the team, but how the team interacts. What's important is not who leads it, who is on it, how many people or on it, or where it is. What's most important important is almost antithetical to they hyper-geek ethos: psychological safety. Everyone feels like they have the opportunity to speak up and everyone feels like they are being listened to. Team members should be sensitive to non-verbal queues. Team members feel like they can fail openly. Next is dependability, do people do what they say they will do? Structure and clarity. Have a shared understanding of what everyone's job is. Meaning. The work should be personally meaningful to every person on the team. Impact. Team members need to think that their work matters. Those last two are trouble. Meaning and Impact are deep existential considerations and most development jobs have no chance of reaching so high a bar.

  • There's a weekly Kubernetes newsletter you might be interested in. 

  • Here's a helpfully detailed description of the Kafka Ecosystem at LinkedIn. They like its strong durability and low latency and find uses for its highly scalable messaging prowess in lots of places. Their ecosystem consists of a number of parts to work together: Brokers, Mirror-Maker, Schema Registry, REST Proxy, Nuage (self-service portal for online data-infrastructure resources), and much more. 

  • Maybe that burner phone isn't as secure as you think. Location Data on Two Apps Enough to Identify Someone: The team developed an algorithm that compares geotagged posts on Twitter with posts on Instagram and Foursquare to link accounts held by the same person. It works by calculating the probability that one person posting at a given time and place could also be posting in a second app, at another time and place. The Columbia team found that the algorithm can also identify shoppers by matching anonymous credit card purchases against logs of mobile phones pinging the nearest cell tower. This method, they found, outperforms other matching algorithms applied to the same data sets.

  • Intel’s Ken LeTourneau found for Percona Server for MySQL, if you are IO bottlenecked at all, more expensive SSDs can cost about 15% more than hard drives but offer more than 60% better performance. At least that's what I think he said.

  • Great interview with James Gosling on Triangulation 245. A gracious and interesting guy. His epithet will no doubt read "father of Java programming language" but that would be doing him an injustice. He's done a lot. James is now working at Liquid Robotics on these really cool autonomous solar powered ocean fairing data collecting robots that can carry out long distance sensing missions without human guidance. Something came up during the interview that I had repressed, that your SIM card contains a JVM. The Secret Life of SIM Cards

  • More on Kafka. How We Monitor and Run Kafka At Scale. SignalFx moves 70+ billion messages per day on 27 Brokers, 1000 (approx) active partitions, 20 (approx) active topics. Collecting Metrics: GenericJMX. Important metrics to track: Log flush latency (95th percentile),  Under Replicated Partitions, Messages in / sec per broker and per topic, Bytes in / sec per broker, Bytes in / sec per topic, Bytes / message. Metrics actually mean something: log flush latency and under replicated partitions to be the leading indicators that we need to pay attention to what’s going on and prepare to investigate a new bug or regression; Log flush latency is important, because the longer it takes to flush log to disk, the more the pipeline backs up; Under-replicated partitions tells us that replication is not going as fast as configured; Messages and bytes in tell us how well balanced our traffic is; A sudden, positive change in the message size could indicate a bubble in the pipeline or a fault in an upstream component; A trend of larger message sizes over time suggests an unintended architectural change. From our experience: we’ve found that it’s most useful to notify on alerts for the two leading indicators: Log Flush Latency (95P) and Under Replicated Partitions.

  • Good game advice: Quit my full time corporate job. Built an iOS game. It became #1 in the App Store. Here are revenue numbers and what I learned.

  • If you are interested in Serverless computing then Serverless Single Page Apps might be a good book to take a look at.

  • New ideas for cluster controllers. Simple genetic circuit controls pattern formation and scale in bacterial colonies: In our experiment, we get a spatial cue from an unsuspected source. We sort of get it for free from the timing of the genetic circuit,” said You. “These two diffusible molecules aren’t dictating at what positions cells are going to stop or start producing proteins. Instead, they’re telling the cells when to start or stop producing proteins. That’s enough to both produce a pattern and to control its scaling, and it’s a fundamentally new mechanism.

  • Should You Use a Rowstore or a Columnstore?: The main takeaway is pretty straightforward. Rowstores excel at random reads/writes and column stores excel at sequential reads/writes. You can look at almost any workload and determine which bucket it falls into. 

  • Didn't Google say they wouldn't do this? Google gives speedier AMP-powered web pages a top spot in Google News: Google announced today it’s giving higher placement to articles in Google News that come from those publishers who have embraced the Google-backed “Accelerated Mobile Pages Project” (or, AMP). 

  • What makes a good paper? Good writing, Makes you think, Clear message, Have confidence, Have an opinion, Define your end point, Think about where and how to use data to back up your point, Set the research in a disciplinary context. Make papers accessible, Take advantage of impact/contribution statements, Make people do things/act, Be part of a series of ‘stepping stones’ of evidence, Have a good title. Also, Snuggling Up To Papers We Love - What's Your Favorite Paper?How Can We Spark The Movement Of Research Out Of The Ivory Tower And Into Production?

  • Steve Lerner: Yeah but remember that you are talking about large object delivery only... this is an increasingly small portion of the margin (and importance) of CDN... commodity bandwidth isn't a business anymore for pure-play CDNs. Their value is in the full edge service suite: http2, IPV6, firewall, complex rule processing, log streaming, edge HTTPs hosting, etc etc etc....If you NSlookup,, etc its all still Akamai... where the transactions are, the pure play CDNs still add value.

  • In praise of 40 Years of Suffix Trees: Over the years, such structures have held center stage in text searching, indexing, statistics, and compression as well as in the assembly, alignment, and comparison of biosequences. Their range of scope extends to areas as diverse as detecting plagiarism, finding surprising substrings in a text, testing the unique decipherability of a code, and more. Their impact on computer science and IT at large cannot be overstated. 

  • Wangle (Facebook): a full featured, high performance C++ futures implementation. Going forward, Wangle will also provide a set of common client/server abstractions for building services in a consistent, modular, and composable way.

  • facebook/fresco: a powerful system for displaying images in Android applications. Fresco takes care of image loading and display, so you don't have to. 

  • amznlabs/ion-java: Java streaming parser/serializer for Ion. jcrites: Ion's advantage is that it's both strongly-typed with a rich type system, as well as self-describing.

  • The Linux Scheduler: a Decade of Wasted Cores: Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database.

  • Consistency in Non-Transactional Distributed Storage Systems: In this paper we aim to fill the void in literature, by providing a structured and comprehensive overview of different consistency notions that appeared in distributed systems, and in particular storage systems research, in the last four decades. We overview more than 50 different consistency notions, ranging from linearizability to eventual and weak consistency, defining precisely many of these, in particular where the previous definitions were ambiguous.

  • Spatial Structure and Scaling of Agricultural Networks: We suggest that treating agricultural land cover as spatial networks can provide a straightforward way of characterizing the connectivity of complex spatial distributions of agriculture across a wide range of landscapes and at spatial scales relevant for practical agricultural applications.

  • aphyr/distsys-class: This outline accompanies a 12-16 hour overview class on distributed systems fundamentals. The course aims to introduce software engineers to the practical basics of distributed systems, through lecture and discussion. Participants will gain an intuitive understanding of key distributed systems terms, an overview of the algorithmic landscape, and explore production concerns.

Reader Comments (3)

this is great!

please continue doing this.

April 23, 2016 | Unregistered Commentermut


April 27, 2016 | Unregistered CommenterMe

Thanks for sharing all of this! It's good to know what is going on in the internet recently and what they're saying about scalability.

May 16, 2016 | Unregistered CommenterJordan

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>