Stuff The Internet Says On Scalability For February 17th, 2017

Hey, it's HighScalability time:

Gorgeous satellite images of a thawing Greenland (NASA).
If you like this sort of Stuff then please support me on Patreon. Sorry for the delay, can't write while driving.

  • 1 cubic millimeter: computer with deep-Learning; 1,600: data on nearby stars; 40M: users for largest Parse app; 58x: Tensorflow 1.0 speedup on 64 gpus; 46%: ecommerce controlled by Amazon; 60%: IT growth in public cloud; 200 TB: one tv episode; 

  • Quotable Quotes:
    • @krishnan: Serverless will not be around in 5 years. It will be AI coding AI coding Ai....... Serverless or not doesn't matter #RunForrestRun
    • user5994461: Amazon: Create usual services and sell them. Google: Make unique products that push the boundaries of what was previously thought possible. Amazon: Don't care about inefficiencies and usage. Inefficiencies can be handled by charging more to the clients, usage doesn't matter because the users are mostly the clients and they don't feel their pain. Google: Had to make all their core technologies efficient, performant, scalable and maintainable or they couldn't sustain their business.
    • Hans Rosling: To me, the impressive thing is that people succeed at all.
    • @littleidea: Google Spanner didn't beat CAP, just mitigated the hell out of P
    • @jordw: Cloud Spanner is a very well-engineered CP database that is also very good at being available.
    • Cade Metz: The AI Threat Isn’t Skynet. It’s the End of the Middle Class
    • hosh: Four years ago, I determined that while development work might seem to be near the top of the food chain, there will at some point where my work will be replaced by AIs.
    • mi100hael:  I found Go's "simplicity" to be limiting and frustrating when it came to building production applications. Things like the weird split between functions returning errors but occasionally panicking, lack of inheritance, and poor dependency management through github links make Go a poor choice for applications within a business setting. 
    • @NathanTippy: New #Java web server clearing 1 million HTTP requests per second on 4 core box.  Can run in < 100MB of memory.
    • @kellabyte: It doesn’t matter what the founder or developer of a database tells you. It’s about the true peopeties it guarantees.
    • @swardley: Private cloud starting to drop, public cloud a three horse race - AWS 1st, MSFT 2nd, GooG 3rd ... sensible stuff 
    • @ollekullberg: Kullberg's law: when we increase the size of a microservice we increase the benefit of static typing for this microservice.
    • @swardley: ... it's not lack of engineering capability or finance or market or marketing or branding, the real story of cloud is executive failure.
    • katied: Trophic cascade is a process that starts at the top of a food chain and works its way to the bottom of it. So, even though as predators wolves survive by taking life, they also have the ability to create it.
    • @swardley: Cloud wars in IaaS - oh, please. War was well over in 2012, yes there will be price cuts as constraints are reduced but there is no battle.
    • @HenryR: 1. CAP has always said only one thing: that there is always a particular network failure that forces you to give up either C or A. 2. It has nothing at all to do with how likely that failure mode is. The failure is system-specific. 
    • throwawaydbfif: The movement from ownership to renting on the web is absolutely terrifying to me. Within the span of a few years we've gone from owning our technology to renting it out from a big players for monthly fees that we cannot completely predict or control.
    • computerex: People use cloud computing because it already is massively impractical to run your own servers. Hardware is hard to run and scale on your own and experiences economies of scale. This principle is seen everywhere and can hardly be viewed as something controversial. 
    • stuckagain: You did not ever own your own globally consistent, massively scalable, replicated database. The fact that you can now rent one by the hour is strictly an improvement for you, if you need that kind of thing
    • tedd4u: Aurora is very cool but won't help you much after you vertically scale your master and still need more write capacity. With Cloud Spanner you get horizontal write scalability out of the box. Critical difference.
    • @koivimik: REST != CRUD via HTTP #microXchg @olivergierke
    • Linus: It's almost boring how well our process works. All the really stressful times for me have been about process. They haven't been about code. When code doesn't work, that can actually be exciting ... Process problems are a pain in the ass. You never, ever want to have process problems ... That's when people start getting really angry at each other.
    • @littleidea: Almost every task run under Borg contains a built-in HTTP server that publishes information about the health of the task...
    • W. Daniel Hillis: For Richard [Feynman], figuring out these problems was a kind of a game. He always started by asking very basic questions like, “What is the simplest example?” or “How can you tell if the answer is right?” He asked questions until he reduced the problem to some essential puzzle that he thought he would be able to solve.
    • @ewolff: "Every hackathon uses Lambda. They build really complicated, production-ready systems in 12h" @adrianco at @microXchg
    • Daniel Bryant: The term "microservices" itself will probably disappear in the future, but the new architectural style of functional decomposition is here to stay.
    • @rbranson: The NoSQL movement might be a disappointment, but emerging from the rubble is the log-based (i.e. Kafka) model that actually works.
    • Chip Overclock: Surprisingly, GPS satellites actually know nothing about position. What they know about is time.
    • @codinghorror: I look at my old blog posts and think... there was a time when I believed 24GB was a lot of RAM
    • vidarh: Depending on your workloads, DO servers can come out cheaper or more expensive than AWS, but bandwidth at DO is so much cheaper than AWS that for bandwidth intensive stuff I can't serve entirely out of Europe (where Hetzner is vastly cheaper than DO again), DO is often a much cheaper alternative. Sometimes we use it as a cost-cutting do-it-yourself CDN in front of AWS for clients that insist on S3 for storage (and again where we can't just cache everything in Europe for latency reasons). For bandwidth heavy applications, you can pay for significant numbers of Droplets from the AWS bandwidth savings alone.
    • lobster_johnson: we use Google Container Engine (hosted Kubernetes), with Salt for the non-GKE VMs. This is needed because K8s is not mature enough to host all the things. In particular, stateful sets are still in beta. 
    • anonymous: The overall impact [algorithms] will be utopia or the end of the human race; there is no middle ground foreseeable. I suspect utopia given that we have survived at least one existential crisis (nuclear) in the past and that our track record toward peace, although slow, is solid.
    • keenio: In conclusion, the TCO is probably significantly lower for Kinesis. So is the risk. And in most projects, risk-adjusted TCO should be the final arbiter.
    • Adem Efe Gencer: the weekly [Bitcoin] mining power of a single miner has never exceeded the 30% of the overall mining power in 2016. Morever, in the second half of the year, the highest mining power has consistently been under the 20% range.
    • David Rosenthal: The security downside of Postel's Law is even more fundamental. The law requires the receiver to accept, and do something sensible with, malformed input. Doing something sensible will almost certainly provide an attacker with the opportunity to make the receiver do something bad.
    • douche: That's pretty much the way it has always been. You can go back at least to the Civil War and find politics has had more to do with procurement than performance of the weapon systems in question.
    • Jonathan Suen: While the brain and the Internet clearly operate using very different mechanisms, both use simple local rules that give rise to global stability. I was initially surprised that biological neural networks utilized the same algorithms as their engineered counterparts, but, as we learned, the requirements for efficiency, robustness, and simplicity are common to both living organisms and the networks we have built.
    • Bruce Johnson: Code reviews set the tone for the entire company that everything we do should be open to scrutiny from others, and that such scrutiny should be a welcome part of your workflow rather than viewed as threatening.
    • codingmyway: I think some miners are against any increase because it will lower fees. Without a blocksize limit fees tend to zero, which is fine while there is the block reward but they still want to milk the congestion fees. To say they are pro segwit or pro unlimited is bluffing. They are pro status quo and congestion and high fees.
    • edejong: Many engineers I have worked with like to throw around terms like: "CQRS", "Event sourcing", "no schema's", "document-based storage", "denormalize everything" and more. However, when pushed, I often see that they lack a basic understanding of DBMSes, and fill up this gap by basically running away from it. For 95% of the jobs, a simple, non-replicated (but backed-up) DBMS will do just fine.
    • adamu__: If China were to shut down bitcoin mining, my understanding is that the worst case scenario is much more dire. The network only adjusts the 'difficulty' relative to current network hash power every 2,016 blocks. Depending on the severity of the overall hash power reduction, new block discovery might slow down significantly. This would also delay a recalculation of the new difficulty accommodating the reduction in hash power. The network could be severely throttled for weeks.
    • boulos: Slightly off-topic, but EC2 doesn't really scale independently if you compare it to GCE. We let you combine 24 vcpus with 39 GB of RAM, 3 partitions of Local SSD and a few GPUs, all independently (though the ratio of RAM to vcpu is currently bounded between .9 and 6.5).
    • Veratyr: Personally, I settled with colocation. I pay $60/mo + $2k one-off for the initial hardware + say $150/5y/4TB HDD, which, for 80TB of storage over 5y comes out to a total of ~$88/mo, or $0.001/GBmo. 

  • Now this is object oriented programming. New software for increasingly flexible factory processes: new software that allows each individual component to tell the machine what has to be done. By breaking away from central production planning, factories can achieve unprecedented agility and flexibility...
    Everything would go much faster if production and the requisite machines were not rigidly set by a control program, but if every component itself knew the best way for it to be moved quickly through the process chain. 

  • Relax. Videos from TensorFlow Dev Summit 2017 are now available. Also, Learn TensorFlow and deep learning, without a Ph.D. Also also, Deep Learning book.

  • Google is Introducing Cloud Spanner: a global database service for mission-critical applications. It will be interesting to see if Spanner, as a unique hard to duplicate feature, becomes a Google Cloud differentiator. Will it make the delta between the clouds significant enough that developers choose Google? Quizlet, already running on GCP, really likes Spanner, but it's not a drop in replacement for MySQL. Like with NoSQL there's special care and feeding to make it work, but that's the sacrifice high QPS requires. Performance: "Cloud Spanner queries have higher latency at low throughputs compared with a virtual machine running MySQL. Spanner's scalability, however, means that a high-capacity cluster can easily handle workloads that stretch our MySQL infrastructure." And p90s are consistently lower than 50 ms. Cost: "For very small or low-throughput databases Cloud Spanner is overkill [min ~$8,000/yr]...Cloud Spanner comparable or slightly cheaper based on the performance in our testing."  With Spanner hitting the market maybe that will help CockroachDB? Some older articles: Spanner - It's About Programmers Building Apps Using SQL Semantics At NoSQL ScaleGoogle Spanner's Most Surprising Revelation: NoSQL Is Out And NewSQL Is InF1 And Spanner Holistically ComparedHow Google Invented An Amazing Datacenter Network Only They Could Create

  • What happens when you dump the App Store? More revenue as a whole without much damage to sales.

  • Or why companies don't dominate forever. Amazon cloud leader Andy Jassy sizes up the competition in rare public remarks about rivals: I think there’s a second category of large technology companies that just took an ungodly long time to get there. These are companies like Oracle and IBM, some of those folks. I think for them the model that we were pursuing, in the cloud, was so disruptive to their core businesses. The margin structures are radically different. The pricing structures are dramatically different. The delivery model is radically different. The way you take care of customers is radically different. So different that I think you have that dilemma at a lot of large companies: do you really want to try and accelerate the cannibalization of an existing … I think that they fought as long and hard as they could. They pooh-poohed it and they said, first no one will use it, then maybe only startups will use it, and they won’t use it for anything real, then enterprises will never use it, then enterprises will never use it for anything mission-critical. Companies and developers voted with their workloads, and so now they’re in this spot of trying to spin something up. It’s six, seven years late.

  • So many fascinating details. The AWS and MongoDB Infrastructure of Parse: Lessons Learned: 1 million apps were deployed to Parse; Original reason for Facebook to acquire Parse was to push their mobile SDKs and to create synergies with mobile ads; Pricing model measured in guaranteed requests per second did not work well; Business problem: people tended to remain in the free-tier; Backend was completely on Amazon Web Services; The Mongo Write Concern was 1 (!), i.e. writes were confirmed before they were replicated. Some people complained about lost data and stale reads; providers should be open about their infrastructure and trade-offs, which Parse only was after it had already failed.

  • Will your AI do this? Man purposely crashes his Tesla to save another driver. Yay humans!

  • Could it be the high cost of renting in the cloud? Ben: The biggest problem here is Snap’s cost of revenue per user: it’s going up, and it has been for a while. This isn’t quite as bad as it seems, because Snap’s cost of revenue includes revenue sharing payments to publishers; once you back that out, though, the company has gone from paying ~$0.47/user in 2015 to ~$0.66/user in 2016, a 40% increase, and an amount (per user) that well exceeds that of Facebook or Twitter at the time of their IPOs. That means that to become profitable Snap has to not only grow users, it has to grow them faster than its costs are increasing, or grow revenue per users by that much more.

  • Yes indeed, every line of code is someone else's legacy system. Legacy systems are everywhere

  • Serverless, not just for greenfield. But I thought we weren't supposed to rewrite? Evolution of Business Logic from Monoliths through Microservices, to Functions: The real point I’m [adrian cockcroft] making is that the ROI threshold for whether existing monolithic applications should be moved unchanged into the cloud or rewritten depends a lot on how much work it is to rewrite them. A typical datacenter to cloud migration would pick out the highly scaled and high rate of change applications to re-write from monoliths to microservices, and forklift the small or frozen applications intact. I think that AWS Lambda changes the equation, is likely to be the default way new and experimental applications are built, and also makes it worth looking at doing a lot more re-writes.

  • Recommend a development stack for small company: go with a boring stack;  C#/.NET; React, Node, NoSQL; Laravel;  VueJS; Ruby on Rails;  ghost.org; Linux, Nginx, PostgreSQL, Python.

  • Can technology bring us together? Algorithm can create a bridge between Clinton and Trump supporters: When applied on Twitter discussions around the US election results, the algorithm suggests that creating a bridge between @hillaryclinton and @breitbartnews would reduce polarization the most. However, taking into account how likely such a bridge is to materialize, the algorithm suggests that other bridges between less prominent Twitter accounts, for instance liberal journalist @mtracey and conservative activist @rightwingangel show better potential”, describes researcher Kiran Garimella.

  • Blocking a DDoS Upstream: You should not just rely on technical tools to block a DDoS attack upstream. If you can figure out where the DDoS is coming from, or track it down to a small set of source autonomous systems, you should find some way to contact the operator of the AS and let them know about the DDoS attack. 

  • Data Shows Traditional CDNs Are Losing Competitive Edge in US Mobile App Arena: Akamai is leading the market with 35.3% market share (no surprise) with vendors such as Fastly, Verizon and Amazon following a 3:1 ratio. In addition multiple smaller players indicate it’s already a mature and saturated market...Amazon is leading with a 40% market share [for apps], most likely due to its strong developer relationships...The key takeaway from this data is that the mobile app market is a new world that’s very different from the commoditized CDN market and one that is growing faster than anyone had predicted.

  • Videos from Machine Learning @Scale 2017 are now available

  • RethinkDB versus PostgreSQL: my personal experience: PostgreSQL was faster, usually 5x faster, sometimes only 2x faster, and often even 10x faster; the act of writing queries in SQL was much faster for me than writing ReQL; Backups were an order of magnitude faster with PostgreSQL; The total disk space usage was an order of magnitude less (800GB versus 80GB); PostgreSQL tends to use (at least) an order of magnitude less RAM to do the same thing; there simply is no need for a connection pool for my application, since PostgreSQL is so fast; things started to fail spectacularly - Using EXPLAIN I found that with full production data the query planner was doing something idiotic.


  • Joining a billion rows 20x faster than Apache Spark: The improvement seen in this particular case is due to a more generic approach to optimizing joins on contiguous values. While Spark uses a single column “dense” vector to optimize the single column join case, the SnappyData’s hash-join implementation uses per-column MIN/MAX tracking to quickly reject streamed rows if any of the join keys lie beyond the limits. Thus while Spark’s optimization works only for specialized single column cases, the approach in SnappyData works for a much wider range of queries. Beyond this specific optimization, the hash grouping and join operators in SnappyData are themselves tuned to work much better with its column store. 

  • We have some results back from big data science carried out using the Apple Watch. Which workout routine works best? Data from 34,369 New Year’s Resolutions measured on Apple Watch. Do: run, elliptical, weight lift. Walking, not so much.

  • Chip Overclock with a cool way to learn about GPS. Better Never Than Late: From just parts I had on hand, I threw together a little portable battery-powered remote unit that reads data from a GPS receiver and transmits it to a base station in my office. The base station feeds the data to the Google Earth application. The Google Earth app produces a moving map display showing the path of the portable unit. All this occurs in real-time, the map updating on the display in my office as I drive around the neighborhood with the remote unit.

  • dccorona: I do a ton of work with Kinesis, and as we continue to scale our streaming workflows further and further, I find more and more to love about it. It's to the point where I'm pretty much sold on it (or, more specifically, the concept of append-only, checkpoint-based queuing systems) being a better fit for nearly everything than a simpler queue like SQS (something the article seems to disagree with, more on that in a sec). This is a particularly important design distinction because that basic approach lends itself really well to the super-powerful, next gen stream processing frameworks that are around these days. Apache Flink is the one that really comes to mind for me...you can do a ton of stuff that seems impossible elsewhere when you have a stream that is checkpoint-based, combined with a streaming framework that supports checkpointing internal system state alongside stream progress markers. This plus the ability to easily add multiple listeners are the big two for me, and I suspect that a lot of other people will benefit from choosing a stream platform (be it Kinesis, Kafka, or something similar) that offers these kinds of flexibilities. The only time I've found a queue like SQS to be ultimately a better fit than Kinesis is when round-trip latencies need to be measured in the single-digit milliseconds. In all other cases, Kinesis seems as good or better to me.

  • What does a BGP hijack look like? It looks like this: A BGP Hijacking Technical Post-Mortem: This incident is another example of BGP’s technical limitations being exploited to restrict internet access. It’s also a unique instance of one country’s laws spilling outside their jurisdiction. To protect their respective IP space, providers should consider implementing RPKI.


  • A long and thoughtful exploration with input from many many experts. Code-Dependent: Pros and Cons of the Algorithm Age. It's divided into 7 themes: Algorithms will continue to spread everywhere; Good things lie ahead; Humanity and human judgment are lost when data and predictive modeling become paramount; Biases exist in algorithmically-organized systems; Algorithmic categorizations deepen divides; Unemployment will rise; The need grows for algorithmic literacy, transparency and oversight. 

  • It's coming! And it's fast. First 5G Network end-to-end demonstration by Ericsson: end-to-end data rates of more than 1Gbps and roundtrip latencies of about 4 milliseconds.

  • Khan Academy has a good section on Algorithms. This also looks cool: The Art of Storytelling

  • lyft/ratelimit: Go/gRPC service designed to enable generic rate limit scenarios from different types of applications.

  • openzipkin/zipkin-sparkstreaming: a streaming alternative to Zipkin's collector. 

  • Bizur: A Key-value Consensus Algorithm for Scalable File-systems: Bizur does exactly that, by reaching consensus independently on independent keys. This independence allows Bizur to handle failures more efficiently and to scale much better than other consensus algorithms, allowing the file-system that utilizes Bizur to scale with it.

  • Software Engineering at Google: The aim of this paper is to catalogue and briefly describe Google’s key software engineering practices. 

  • How the Brain Might Work: Statistics Flowing in Redundant Population Codes: We propose that the brain performs approximate probabilistic inference using nonlinear recurrent processing in redundant population codes. Different overlapping patterns of neural population activity encode the brain’s estimates and uncertainties about latent variables that could explain its sense data. Nonlinear processing implicitly passes messages about these variables along a graph that determines which latent variables interact according to an internal model of the world.