hot links

Stuff The Internet Says On Scalability For February 2nd, 2018

High Scalability

02 Feb 2018 — 19 min read

Hey, it's HighScalability time:

Are silcon device designers also artists? Of course. (DAC Silicon/Technology Art Show)

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics.

2 billion: Siri requests per week; 1 trillion: semiconductor unit shipments, 9.1 compound annual growth rate over a 40 year span; 150 million: IPv4 addresses recirculated over 7 years; $100 Billion: value lost in cryptocurrency markets in 24 hours; $1: microcontroller; 16 Gb/s: GDDR6 SGRAM; 1/3: IPv4 addresses registered to US entities; eight minutes and thirty-five seconds: breath-hold record; $32.32 billion: Google revenue, up 24%; 1.3 billion: active Apple devices, up 30% in 2 years; 500 petabytes: Backblaze; 6th or 7th: HTTP version of hypertex on the internet; $2.1 billion: Amazon's Q4 2017 operating income, up 69%; 45%: jump in Amazon cloud revenue; 9%: global smartphone market drop; 3 billion: photos uploaded to Google on New Year's eve; 1.5 billion: montly YouTube users; $530 million: stolen by hackers in the biggest cryptocurrency theft yet; 9,500: computers forced to be reinstalled by ransomware;

Quotable Quotes:
- ARM 2 106: Tell my lord: Your servant Yakim-Addu sends the following message: A short time ago I wrote to my lord as follows: "A lion was caught in the loft of a house in Akkaks. My lord should write me whether this lion should remain in that same loft until the arrival of my lord, or whether I should have borught to my lord." But letters from my lord were slow in coming and the lion has been in the loft for five days.
- Brice Morrison: I'm predicting by 2020 there will be a billion dollar game where the primary way to play is with your voice. Right now Amazon Alexa and Google Home are simple utilities. The games on each of them are just toys - experiments to round out playing music and turning on smart lightbulbs. But each month signs are growing stronger that voice is becoming the next major growth platform. And as technology goes, games follow.
- Geoff Huston: The days when the Internet was touted as a poster child of disruption in a deregulated space are long since over, and these days we appear to be increasingly looking further afield for a regulatory and governance framework that can continue to challenge the increasing complacency of the newly-established incumbents.
- @jckarter: Reminder: running 32 bit processes on a 64 bit CPU for prolonged periods can lead to burn-in of the unused high bits. Degauss your CPU regularly
- Ted Nelson~ We had visions of democratization, citizen participation, great vistas of possibily participation for atistic expression in software. Software is an artform, though not generall recognized as such—exactly how you select the keys, exactly how you position things on the screen—has an impact. In the old days there was a greater shared citizen vision of the personal computing movement.
- @n_srnck: Uber is buying 24,000 cars. Facebook is spending $1 billion on original TV shows. Alibaba is spending $2.6 billion for physical stores. Airbnb is opening branded apartment buildings.
- @etherealmind: This is a big deal. Open sourcing the SDK changes whitebox market.
  
  Link: Broadcom Expands Ethernet Switch Software Suite with Industry’s First Fully Open Source Software Development Kit. Enterprises can now easily develop their own Network Operating System. Open source projects are now free to flourish. Vendors based on open source can easily expand their feature sets. Obviously a reaction to SAI/Sonic.
- Geoff Huston: Time and time again we are lectured that NATs are not a good security device, but in practice NATs offer a reasonable front-line defence against network scanning malware, so there may be a larger story behind the use of NATs and device-based networks than just a simple conservative preference to continue to use an IPv4 protocol stack.
- @Tanvim: This guy has been biking really slow outside the FCC to protest its decision to repeal net neutrality, and charging $5 to have vehicles pass him. Lol.
- npz: The days of ASICs are long past. I guarantee you that NO ONE in the general community wants to repeat the same mistake bitcoin and subsequently litecoin made. Hence, all coins have been asic resistant since. And some modern coins / blockchain hashing algorithms are even complex enough to give GPUs a hard time, enough to allow CPUs to be competitive like Monero (XMR/cyryptonight). That's why when it comes to Monero, you'll often hear about XMR-"Stacks" because now even the CPU can be used!
- Philippe Kahn: I met with all of them [Kodak, Polaroid]. Proposed our solution to no avail. They had an established business and thought that it would never go away and they could wait. They totally missed the paradigm shift. Paradigm shifts are challenges for any established player, look at the demise of Nokia for missing the smartphone.
- Jakob: it would seem that reducing the precision and making timing sources more jittery won’t really help with the core problem. It is probably a good idea to do this in JavaScript to make it harder to do exploits, but it is not a panacea. It appears that in the end, it is the side channels themselves that have to be suppressed. Which is not a particularly appealing statement to make, since side channels by definition are not designed into a system. They are discovered as side-effects of otherwise reasonable decisions. In the end, there is no replacement for an adversarial mind-set, and putting resources into thinking about how things can be broken, not just made to work in the first place.
- tw1010: There aught to be a name to the tendency that as tools get better and better, the more your time goes from having your mind in technical-space to social and news-space. It's like the authority to create goes from the individual first-principles (by necessity) maker, to the control over development being in the hands of an external group, and then all your time is spent keeping up with what they're doing. A similar thing happened with a lot of javascript frameworks. It also happened with the transition from building servers from the ground up, to it all being managed by AWS.
- @daveixd: This notion that "it's all just a guess until we ship to production" flies in the face of decades of research in HCI, psych, etc. The point isn't knowing for sure, but increasing confidence as investment increases. Deciding to ignore those opportunities to learn is reckless.
- @davidgerard: Dr Strangelove is actually a film about why immutable smart contracts that cannot be altered by human agency once they're in motion
- Mark Boyd: with serverless, the decisions that developers make increasingly have a tangible cost and business impact.
- @eliesaaab: “You’re basically seeing all of the sunrises and sunsets across the world, at once, being reflected off the surface of the moon” – The Lunar Eclipse explained by NASA
- @cloud_opinion: If this guess is correct, AWS is 50x the size of GCP - let that sink in.
- @esh: After studying Spot instance history, trying a few approaches, a tip from @analytically, and a dash of luck, I was able to get a Spot instance to be interrupted in 14 hours. The trick is to find a Spot instance type and region/availability zone that is fluctuating in a narrow range. I had no luck getting a reservation at any bid where Spot prices had been slowly rising for a while, presumably because they were in continuous high demand.
- Ed Sperling: The bottom line is the chip industry may prove to be much more consistent over the next decade than at any time in its past. And while that may make it less exciting for stock watchers, that’s not necessarily bad.
- @manisha72617183: It's unfortunate if you think that programming is just sitting at a keyboard & typing. 95% thinking/analyzing problems and 5% typing - that's programming .... and you can think just as well lying on the grass with your eyes closed as you can sitting in front of a keyboard
- @cloud_opinion: Google spent nearly $30B on Infrastructure for GCP last year. Likely made around $1B all of last year. This Cloud business is not for the faintest of heart executives.
- apenwarr: The supposed poor performance of QUIC on resource-limited mobile devices is, as they point out, because it's a user space implementation that is thus more expensive. If QUIC becomes popular, I assume there will be kernel implementations that are as resource efficient as TCP. Meanwhile, it's a lot easier to to experiments (such as tuning CUBIC parameters!) when you don't have to reboot to install a new version.
- @joehewitt: Apple tried to pull this shit on me when I was at Facebook - come in for a quick meeting... Just kidding, please camp here all weekend and code a demo for keynote. They were surprised when I said no thanks. I felt that was a pretty rude way to treat your developer partners
- @SwiftOnSecurity: US Navy begins re-learning celestial navigation with sextants in anticipation of GPS loss during Total Cyber War.
- @mims: Amazon is now the starting point for more than half of product searches, and that figure grows every year. Meanwhile, search ad revenue on the site is exploding. How long until this impacts Google?
- @johnrobb: The "right to repair" movement guards against socioeconomic brittleness. It's also a hedge against bad globalization. ~at 9 m: Apple/Microsoft lobbyist tells lawmakers they wouldn't sell products to Nebraskans if they passed this bill.
- Peter Bailis: we have found the design of post-relational data-intensive systems does not necessitate an abandonment of classical data-intensive systems techniques such as declarative interfaces or query planning. Rather, this new class of workloads presents new opportunities for applying these techniques to a broad set of statistically-informed problems. For example, we have found that predicate pushdown, cost-based optimization, and cascaded execution shine when applied in many statistical contexts. Just as relational workloads stimulated decades of research into end-to-end query optimization, systems design, and hardware-efficient execution, I believe this next wave of systems holds similar -- and perhaps even greater -- promise for the systems community.
- Geoff Huston: We are witnessing an industry that is no longer using technical innovation, openness and diversification as its primary means of propulsion. The widespread use of NATs in IPv4 limit the technical substrate of the Internet to a very restricted model of simple client/server interactions using TCP and UDP. The use of NATs force the interactions into client-initiated transactions, and the model of an open network with considerable flexibility in the way in which communications take place is no longer being sustained in today's network. Incumbents are entrenching their position and innovation and entrepreneurialism are taking a back seat while we sit out this protracted IPv4/IPv6 transition. What is happening is that today's internet carriage service is provided by a smaller number of very large players, each of whom appear to be assuming a very strong position within their respective markets. The drivers for such larger players tend towards risk aversion, conservatism and increased levels of control across their scope of operation. The same trends of market aggregation are now appearing in content provision, where a small number of content providers are exerting a completely dominant position across the entire Internet.
- SniperWulf: This is where you are mistaken. The current architectures on both Red and Green teams excel at certian algorithms and are meh at others. Nv can't be touched in equihash, Lyra and a few others, while AMD can't be touched in Cryptonight, Ethash and a few others. It's all about picking the right tool for the job. If you're planning to mine ethereum, why buy a 1070 Ti @ $449 (MSRP) to get 30Mh/s when a RX 480 @ $239 (MSRP) or RX 580 @ $229 (MSRP) can get the same job at 135W. It wouldn't make sense to buy 1080 Ti's ($779 MSRP) to mine Monero @ 800H/s when a Vega 56 ($399 MSRP) can get 1900H/s at the same 135W. On the flip side, I wouldn't by any Radeons if Zencash, Zclassic or Verge were my coin/algo of choice. Granted that those prices mean dick in today's market, it's not about one company vs another, it's all about picking the right tool for the job.

Apple and Verizon are on the wrong side of this issue. Tractor Hacking: The Farmers Breaking Big Tech's Repair Monopoly. @jason_koebler: A year ago, I found out about a community of farmers who trade John Deere firmware hacks on forums and torrent sites. They're hacking their tractors because Deere has encryption keys locking down access to the software, preventing even simple repair. A few months later, @laragheintz and I went to Nebraska to meet some of these farmers/hackers to make a documentary about them and the ongoing right-to-repair movement. We finally released that documentary today. It's one of the cooler video projects I've ever been involved with. So happy how this turned out. Also, Why American Farmers Are Hacking Their Tractors With Ukrainian Firmware, Apple, Verizon Continue to Lobby Against The Right To Repair Your Own Devices, What is OBD II? History of On-Board Diagnostics. Keep in mind there is an open source tractor.

Videos from LaunchDarkly: Test In Production are now available. Talks with titles like: Visibility and Monitoring for Machine Learning Models and Cindy Sridharan: Testing Microservices: A Sane Approach Pre-Production & In Production.

How easy is it these days to build a multi-region multi-master application? Pretty easy. Build a multi-region, multi-master application with Serverless and DynamoDB Global Tables. The key is DynamoDB Global Tables: "This feature, announced at AWS re:Invent 2017, allows you to specify DynamoDB tables in separate regions to act as a single table. Writes in one region will be asynchronously replicated to the other regions. This allows for some powerful applications without writing custom syncing logic." Then you can top it off with a little latency-based routing using Route53: "It allows you to create multiple DNS records for the same resource. Each DNS record points to IP addresses in different regions. When the user makes a DNS query, Route53 will return the record that offers the lowest latency based on the requesting user's location." Also, alexcasalboni/serverless-multi-region-client-demo.

DevConf videos from Brno, Czechia are now available.

Very enjoyable. Extended Director’s Cut: Ted Nelson on What Modern Programmers Can Learn From the Past. Most interesting: Ted's remembrance of Douglas Engelbart, who Moore said gave have him the idea that everything was going to get smaller and faster. "Douglas Engelbart was a great man, I met him 1966," said Ted. In 1951 Douglas said to himself the problems of the world are escalating, getting more and more complicated, what can we do? We need new tools. New ways of supporting teams of people doing hard work. So he invented word processing, outline processing, computer graphics, hypertext, and the mouse. All of this was part of his dream to make people more powerful. I think Douglas is still right on, though it's clear tools are not enough. Not so enjoyable: blaming his team for not getting Xanadu out before HTTP won the world wide web. The buck always stops at the leader. Always.

Event driven in the cloud is a thing. Before that event driven architectures were key to creating any kind of distributed service. Azure has released their Azure Event Grid. An advantage it seems to have is that it can handle user generated events, not just events generated by the cloud provider. Event Grid provides durable delivery. It delivers each message at least once for each subscription, so you have implement idempotence.

Are thin clients back? Engineering Smart && Building Dumb: Building an Android Thin-Client at OkCupid: Over a year, we played whack-a-mole on all three of our platforms (Android, iOS, Desktop) plugging the bugs. In the end, we decided that the best thing to do was have the server handle the heavy-lifting while the clients handled the display. Thus was born the dumb-client mantra: “Don’t do on the client what the server can do.”

Fun and interesting read. DNA seen through the eyes of a coder: DNA is not like C source but more like byte-compiled code for a virtual machine called 'the nucleus'. It is very doubtful that there is a source to this byte compilation - what you see is all you get. The language of DNA is digital, but not binary. Where binary encoding has 0 and 1 to work with (2 - hence the 'bi'nary), DNA has 4 positions, T, C, G and A. Whereas a digital byte is mostly 8 binary digits, a DNA 'byte' (called a 'codon') has three digits. Because each digit can have 4 values instead of 2, an DNA codon has 64 possible values, compared to a binary byte which has 256. A typical example of a DNA codon is 'GCC', which encodes the amino acid Alanine. A larger number of these amino acids combined are called a 'polypeptide' or 'protein', and these are chemically active in making a living being.

Why haven't we been compelled to use IPv6 when the internet has been growing so fast? The Internet is now a client/server network. Addressing 2017: while the Internet has grown at such amazing rates, the deployment of IPv6 continues at a far more leisurely pace. There is no common sense of urgency about the deployment of this protocol, and still there is no hard evidence that the continued reliance on IPv4 is failing us. Much of the reason for this apparent contradiction is that the Internet is now a client/server network. Clients can initiate network transactions with servers but are incapable of initiating transactions with other clients. Network Address Translators (NATs) are a natural fit to this client/server model, where pools of clients share a smaller pool of public addresses, and only required the use of an address while they have an active session with a remote server. NATs are the reason why in excess of 15 billion connected devices can be squeezed into some 2 billion active IPv4 addresses.

Is this how Amazon handles software bugs too? @businessinsider: 'Seeing someone cry at work is becoming normal': Employees say @WholeFoods uses "scorecards" to punish employees for failing to comply with its inventory management system

Apparently there's a lot of consumer demand for talking cylinders. Alexa, print money: Even Bezos stunned by Q4 Amazon income: "So, what the heck happened? Amazon CEO Jeff Bezos added a brief statement to the report that indicated "optimistic projections for Alexa" being exceeded, along with a vague sales figure of "tens of millions" of Echo-related devices throughout all of 2017." Funny how every company in a better position to make this device did not. Funny also how the technorati poo-pooed it to begin with.

Do we have to worry about using memory efficiently again? Why RAM Prices Are Through the Roof: "Over the past two years, the price of DRAM has skyrocketed. A recent report by GamersNexus found that the cost of a specific DDR4-2400 memory kit has leaped from $81 on 2/22/2016 to $196 today ($196 on January 22, $192 on January 30)." It seems DRAM makers learned from oil producers that cutting supply raises prices: "DRAMexchange is reporting memory capacity growth is expected to be at a near-historic low of 19.6 percent in 2018, as Samsung, SK Hynix, and Micron are all cutting back on capital investments. DRAM wafer starts at all three companies are only expected to grow by 5-7 percent this year and fab expansions or new foundries take years to bring online." Also, AMD to Ramp up GPU Production, But RAM a Limiting Factor.

Is a box with 1G RAM for $8/YEAR too good to be true? Depends on what you need it for. Experience with development server 7x cheaper than Linode/DO: So far, I am satisfied with my experience with Woot’s ultra-cheap box; for my purposes (CPU-bound non-time-critical testing) it is good enough – and is darn cheap.

The Meltdown/Spectre saga: The impact across millions of cores: On January 3, we experienced an unpleasant surprise across a cluster of large Redis instances in our infrastructure. Without any clear reason, the instances suddenly started to run much hotter, which required us to scale out...Although the average impact on system.cpu.system was relatively small, accounting for an increase of less than 1 percent in total CPU utilization, the fact that the impact is clearly observable across so many cores, running dramatically varying workloads, shows how widespread the issue was...The spike in system.cpu.system was most pronounced in compute-optimized and general-purpose virtual machines, with a less significant but still clearly detectable increase in memory-optimized instances. The elevated CPU levels across instance types, especially for compute-heavy workloads, speak to the systemic effects of the security patches.

Pinterest found—for them—the majority of smartphones worldwide are Android devices (just under 90 percent) and that more than 75 percent of Pinterest signups came from outside the U.S. Their Android app was slow. A key strategy was make sure the app didn't get slower by testing for regressions, so they used NimbleDroid, a cloud-based continuous performance testing tool that easily integrated with our process and produced actionable results. Between NimbleDroid and experiment alerts, we detected ~30 slowdown regressions over the course of six months.

Is optimizing indexing always the key to improving MySQL query performance? MySQL Query Performance: Not Just Indexes: This wasted effort is all due to focusing on the wrong thing: figuring out how can we find all the rows that match k<1000000 as soon as possible. This is not the problem in this case. In fact, the query that touches all the same columns but doesn’t use GROUP BY runs 10 times as fast: For this particular query, whether or not it is using the index for lookup should not be the main question. Instead, we should look at how to optimize GROUP BY – which is responsible for some 90% of the query response time.

If you are looking for blog host then WordPress Hosting Performance Benchmarks (2018) might be of interest. They divide the results up by price tiers. You definitely have to shop around. Price does not equal performance.

Lots of good details. Google Cloud vs AWS in 2018 (Comparing the Giants).

Maybe all you need is a Roomba? DO or UNDO - there is no VACUUM: What if PostgreSQL didn’t need VACUUM at all? This seems hard to imagine. After all, PostgreSQL uses multi-version concurrency control (MVCC), and if you create multiple versions of rows, you have to eventually get rid of the row versions somehow. I

Five API Usability Lessons from Flutter (Dart Conference 2018): Reduce Context switching: don't make programmers leave the IDE to look for information; Help Build Mental Models: visualize the conception model of the API and map the API to it; Speak Your Users' Language: if it's a list view call it a list view; Enable Programming By Example: use carated and focussed examples in the documentation, also show output and illustrations of the results; Promote Recognition Rather than Recall: leverage autocomplete, use constants not numbers, preview colors in-place, IDE should integrate should show not tell.

Today's favorite new word: rantifesto.

This is simply wrong. It's like saying if you have to compile and link a library you're not really using a library. Serverless is a programming model, an abstraction layer. It matters not who or how the abstraction is provided. Serverless doesn't only exist on AWS. Container people, let’s talk about serverless: If you’re at any point responsible for running containers, even if that’s on a managed kubernetes service, you’re not serverless.

Good description with code examples. A scalable Keras + deep learning REST API.

A small-scale demonstration shows how quantum computing could revolutionize data analysis: Today, that looks set to change thanks to the work of Huang and co, who have calculated Betti numbers using a quantum computer for the first time. “Our experiment suggests that data analytics may be an important future application for quantum computing, with widespread applications in our increasingly data-centric world,” they say.

It’s Go Time: Stream 2.0 Ditches the Pokey Python in Favor of the Faster GoLang: For many kinds of app, the performance of the programming language you use doesn’t matter a whole lot — it’s just there as the glue between the app and the database. But if, like us, you are an API provider powering feed infrastructure for 500 companies and over 300 million end users, performance differences really start to matter. Python is a great language but for some use cases like ranking, aggregation, serialization and deserialization — its performance is, well, pretty sluggish. We had been optimizing Cassandra, PostgreSQL, Redis, etc. for years, but eventually, we just reached the limit...With our API, our customers are basically building the next Facebook, the next Instagram, so features like re-ranking the feed, aggregating the feed — as soon as the data becomes a bit larger, it becomes exponentially more difficult. Our data from Cassandra would take one ms on the backend, but then transport to our customers via Python was taking 20 milliseconds. So much more than time than our underlying infrastructure.

Perhaps we should just say less slow JSON serialization, it's never fast, but faster is definitely better. Fast JSON API serialization with Ruby on Rails: Performance tests indicate a 25–40x speed gain over AMS, essentially making serialization time negligible on even fairly complex models. Performance gain is significant when the number of serialized records increases.

We’ve seen dramatic improvements to the performance. Sorting myself out, extreme edition: We’ve looked at a range of advanced techniques for improving performance of critical loops of C# code, including (to repeat the list from the start): using knowledge of how signed data works to avoid having to transform between them; performing operations in blocks rather than per value to reduce calls; using Span<T> as a replacement for unsafe code and unmanaged pointers, allowing you to get very high performance even in 100% managed/safe code; investigating branch removal as a performance optimization of critical loops; vectorizing critical loops to do the same work with significantly fewer CPU operations

Another good explanation. An accessible overview of Meltdown and Spectre, Part 1.

Perhaps we need a history of digital objects? This was a fun read. Hans Peter Luhn and the Birth of the Hashing Algorithm: This is Luhn’s legacy: He helped show that computers and computation weren’t just the province of mathematics, statistics, and logic but also of language, linguistics, and literature. In his day, this was a revolutionary way to think about machines.

These errors seem mostly client related. Top 10 JavaScript errors from 1000+ projects (and how to avoid them): 1. Uncaught TypeError: Cannot read property; 2. TypeError: ‘undefined’ is not an object (evaluating; 3. TypeError: null is not an object.

Good series on compaction strategies. Scylla’s Compaction Strategies Series: Write Amplification in Leveled Compaction and Scylla’s Compaction Strategies Series: Space Amplification in Size-Tiered Compaction. Why Does Write Amplification Matter? 10% writes may sound not much, but when you combine the fact that often many reads are satisfied from the cache (reducing the amount of read I/O) and that each write request is amplified 13-fold (in our experiment) or even 50-fold, we can easily reach a situation where a majority of the disk’s bandwidth goes to the writing activity.

Now when someone asks, "What do you want, a pony?" you can say that pony seems kind of slow. Why we wrote our Kafka Client in Pony: we’ve spent about 12 weeks of implementation effort and we have a fully asynchronous standalone Kafka client written in Pony...the Pony Kafka client has lived up to our expectations thanks to compiling down to native code and Pony’s zero copy message passing...Pony Kafka sends data to Kafka about 5% - 10% slower than librdkafka but reads data from Kafka about 75% slower than librdkafka.

Brendan Burns (Kubernetes co-founder) has a free book on Designing Distributed Systems for Azure.

The Art of Fuzzing: Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, or failing built-in code assertions or for finding potential memory leaks.

Gray Failure: The Achilles’ Heel of Cloud-Scale Systems: Cloud scale provides the vast resources necessary to replace failed components, but this is useful only if those failures can be detected. For this reason, the major availability breakdowns and performance anomalies we see in cloud environments tend to be caused by subtle underlying faults, i.e., gray failure rather than fail-stop failure. In this paper, we discuss our experiences with gray failure in production cloud-scale systems to show its broad scope and consequences. We also argue that a key feature of gray failure is differential observability: that the system’s failure detectors may not notice problems even when applications are afflicted by them. This realization leads us to believe that, to best deal with them, we should focus on bridging the gap between different components’ perceptions of what constitutes failure

Stuff The Internet Says On Scalability For February 2nd, 2018

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale