hot links

Stuff The Internet Says On Scalability For May 26th, 2017

High Scalability

26 May 2017 — 15 min read

Hey, it's HighScalability time:

Sport imitating tech. Cloud Computing chases down Classic Empire to win...the Preakness. (Daily News)
If you like this sort of Stuff then please support me on Patreon.

42%: increase US wireless traffic since 2015; 44: age of Ethernet; $18.5m: low cost of Target data breach; 25 million: record set from Library of Congress; 98%: WannaCry infections on Windows 7; 100 terabytes: daily Pinterest logging; 2020: when Microsoft will have DNA storage in the cloud; 220 μm: size of microbots; 2 billion: lines of code in Google repository; 40%+: esports industry growth;

Quotable Quotes:
- @Werner: There is no compression algorithm for experience.
- @colinmckerrache: We just crossed over 2m EVs on the road. So yeah, second million took just under 18 months. Next million in about 10 months.
- @swardley: When discussing China, stop thinking cheap labour, communism & copying ... to understand changes, start thinking World's largest VC.
- @JOTB17: "Cars generate more than 4Tb of data a day, humans are becoming irrelevant in data collection" 😳 @saleiva #JOTB17
- Wojciech Kudla: that's why blacklisting workqueues from critical cpus should be on the jitter elimination check list. They can be affinitized just like irqs
- @ryanhuber: Any sufficiently advanced attacker is indistinguishable from one of your developers.
- @spolsky: "During peak traffic hours on weekdays, there are about 80 people per hour that need help getting out of Vim."
- SrslyJosh: Basing anything on proof-of-work puts you in a perpetual race to control more compute than your adversaries.
- gkoberger: So, in my mind, Mozilla won. It's a non-profit, and it forced us into an open web. We got the world they wanted. Maybe the world is a bit Chrome-heavy currently, but at least it's a standards compliment world.
- NoGravitas: The basic argument of this article seems to be that the real benefit of cryptocurrencies, other than their speculative value, is that they provide a way of enforcing artificial scarcity in the digital realm, where scarcity does not come naturally.
- Renee DiResta: The trouble is that “high-frequency trading” is about as precise as “fake news.”
- Silicon Valley: I mean, that Ken doll probably thinks traversing a binary search tree runs in the order of "n," instead of "log n." Idiot.
- @__apf__: If you look at another engineer's work and think, "That's dumb. Why don't you just..." Take a breath. Find out why the problem is hard.
- joshuacc: Because it’s a separate service, the PDF generator can scale up and down based on actual usage of that feature and it won’t bog down any of the other application features.
- @mbleigh: Computer Science: It's the stuff you have to know so you can pass the interview and start building software. #DevDiscuss
- cookiecaper: All of these blockchain-based protocols are impractical. Blockchain is wasteful and slow. It may work adequately for a transaction ledger, but it doesn't work for the web's primary purpose of distributing arbitrary information ad-hoc and on-demand.
- James Governor: When you sit down with one of the AWS engineering teams you’re sitting down with grownups. At a guess median age would be 40-45, someone like Andi Gutmans, now 41, one of the original creators of PHP, who now runs Search and New NoSQL for the firm.
- Veritasium: One physical test is worth a thousand expert opinions.
- @coffeetocode: I got 99 problems, but one of them is multithreading so honestly I'm not sure how many problems I actually have right now.
- @stevesi: 15/ Not jumping on mobile even while I was using it was massive miss. Will never forget my friends in Japan trying to set me right! // EOTS
- @tef_ebooks: people running an operating system held together by perl scripts, m4 macros, and automatically generated bash scripts, saying js is bad
- Vint Cerf: I worry 100 years from now our descendants may not know much about us or be able to read our emails or tweets or documents because nobody saved them or the software you need to read them won’t exist anymore.
- Chip Overclock: before you decide to tackle a project, make sure you can get the tools you need for the entire life-cycle of the product. You may find out that the economics of developing a product are substantially different from the economics of debugging, testing, validating, and supporting that product.
- Joanne Itow: If the cost of critical input materials such as silicon wafers and the cost of 200mm manufacturing capacity increases, how will the industry meet the demand for multiple billions of $10 electronic devices for IoT?
- Steve Yegge: I know a few other programmers who've also full-on converted to Kotlin. Most of them beat me to it by at least a year or two. We buzz about it sometimes. "Kotlin makes programming fun again," we tell each other. The funny thing is, we hadn't fully grasped that programming had become non-fun until we tried Kotlin. It takes you back to when you were first learning programming and everything seemed achievable.

Failing Kubernetes pods by playing whack-a-mole is an awesome idea. Funner than a barrel of chaos monkeys. You just have to see the video.

There are times when specialized hardware absolutely destroys commodity hardware. TensorFlow Frontiers. The need for Google to create the TPU became urgent in 2013 when it was realized if all Android users spoke to their phone for just three minutes a day it might force Google to double its number of datacenters. That drove a crash program to develop the first TPU. The first-gen TPU was 15-30x faster than contemporary CPUs & GPUs, 30-80x more power-efficient, but it only worked for inference, not training. The second-gen TPU has up to 180 teraflops of floating point performance, 64 GB of ultra-high-bandwidth memory, works for both training and inference (simpler to use), and can be connected together using a 2-D toroidal mesh network (tackle largest problems). On one problem training time was reduced from 24 hours to 6 hours.

Another victim of Stacked ranking. T.J. Miller Is Leaving Silicon Valley.

The biggest every day risk from the massive data surveillance panopticon carried out by private corporations is not storm troopers busting down your door, it's this: everything will start costing you more. Whenever an algorithm calculates it has leverage over you it will exploit that advantage to charge you more. A computer mediated personalized world will anticipate your needs, but it will also invisibly shape them. What is being created is the ultimate Skinner Box. Uber Is Using AI to Charge People as Much as Possible for a Ride.

You expect batch inserts to help with IO, but they also improved CPU usage 10x on a Postgres cluster. How Basic Performance Analysis Saved Us Millions. Lesson: validating your assumptions by profiling can really pay off. Less-obvious win: as load increases fewer servers are needed.

Rules and heuristics are out. What's in? Machine learning to make decisions in complex highly dimensional problem spaces. Two examples. The Machine Intelligence Behind Gboard. In their mobile keyboard mapper Google replaced replaced both their Gaussian and rule-based models with a single, highly efficient long short-term memory (LSTM) model trained with a connectionist temporal classification (CTC) criterion. How Amazon Web Services uses machine learning to make capacity planning decisions. AWS uses a forecasting model driven by machine-learning research to make capacity decisions. For example, it can pick up signals from the process its sales teams follow (enterprise sales cycles are notoriously long) to forecast demand. A lot of new customers like to start slow on AWS and then accelerate their usage as they see more benefits which can lead to spikes in demand if they move faster than anticipated. iRobot has a large development footprint, AWS IoT, Cognito, and a few others are singleton services, it's very difficult for multiple developers to play nicely together on the same account. So each developer gets their own account, the result of which is account sprawl.

Eschewing thousands of years of tradition Apple chose a taurus shape instead of a pyramid for their memorial. One More Thing.

Gives a broad overview of Aurora, CitusDB, Cosmos, CockroachDB, Spanner, MongoDB, and Postgres. A Comparison of Advanced, Modern Cloud Databases. Recommendation: the best choice for most people will be to start with: Postgres, vertical scaling will go a long way; at the scale of AirBnB or Uber: Aurora; at the scale of Google: Spanner. Also, Scaling Amazon Aurora at Ticketea.

Think there's no DevOps in Serverless? To disabuse yourself of that notion watch Serverless Ops for the iRobot Fleet. iRobot manages millions of robots with 0 unmanaged EC2 instances, 100+ lambda functions, 25 AWS services, 1000s of Lambda deploys per day, all with a low single digit number of engineers in operations. They usse Red/Black deployment. An instance of the system is running in production (API gateway, Lambda, Cloud Front, Kinesis, etc). A proprietary tool stiches all those together and deploys using Cloud Formation. For a new deployment a new instance is created and using a service discovery mechanism all the old red stuff moves to the new black stuff. Some services are long lived, like DynamoDB and Cognito pools, so red and black must coexist. Monitor is with SumoLogic, can lookup anything with their query language, can generate alarms, also use CloudWatch. A big task is managing all the accounts, found using ADFS (Active Directory Federation Services) essential. Never use access keys and secret keys. They've orchestrated things to always use services account and AD to have temporary credentials that are only good for an hour. Multi-region backup is via DataPipeline to S3. They use S3 like dropbox because it's easy to cross account work, can say this bucket is accessibly by everyone, or just engineers, etc. Dealing effectively with multiple accounts is a big part of the job. Have tools to manage all 50+ accounts at once: run scripts, maintain ADFS and IAM roles/policies in source code control, roll out standardized logging infrastructure. Lambdas are run every hour on consolidated billing, the data is put into SumoLogic and SumoLogic is used to access the billing information. The biggest downside of Serverless is the lack of visibility when there's a problem. Metrics are your window into what's going on inside AWS. AWS Enterprise support is invaluable and is the first tier of support for all their internal users. The Personal Health Dashboard should give a lot more visibility into what's wrong with your instances, Lambda functions, etc. A lot of the value iRobot has generated in the last year has been because of this ecosystem.

How hard is it to move to HTTPS? Not hard at all if you are a small shop. If you are StackOverflow there's more to it than you might possibly imagine. The entire process took 4 years. The story is beautifully told by Nick Craver in HTTPS on Stack Overflow: The End of a Long Road.

This explains why in the Matrix the AIs needed to harness humans as a new free energy source. The energy expansions of evolution: The history of the life–Earth system can be divided into five ‘energetic’ epochs, each featuring the evolution of life forms that can exploit a new source of energy. These sources are: geochemical energy, sunlight, oxygen, flesh and fire. The first two were present at the start, but oxygen, flesh and fire are all consequences of evolutionary events. Since no category of energy source has disappeared, this has, over time, resulted in an expanding realm of the sources of energy available to living organisms and a concomitant increase in the diversity and complexity of ecosystems. Also, Battery-free implantable medical device draws energy directly from human body.

Will GraphQL replace REST? GitHub is moving to GraphQL because it "offers significantly more flexibility for our integrators. The ability to define precisely the data you want—and only the data you want—is a powerful advantage over the REST API v3 endpoints." GraphQL kind of reminds of tie in perl. Very good discussion on reddit. NeverSpeaks: With my frontend developer hat on, I love the idea of GraphQL. Makes a lot of things easier. With my backend hat on, implementing a GraphQL API seems to be a nightmare. turkish_gold: GraphQL doesn't automatically execute anything on your server. You have to write the server yourself, to resolve any queries given. You also have to write the client yourself. It's not a library, it is a specification. When you're server side, you write 'resolvers' to handle each part of a GraphQL query. Those resolvers can do anything---authentication, added filtering, load balancing, etc. Whatever you return is the result. You're not bound to any particular database. You can even calculate things on the fly and return that or return static content. Also, Serverless and GraphQL: A Perfect Match for the New Cloud Paradigm.

Videos are now available for DConf2017.

Distributed consensus as you've never seen it before. How Your Data is Stored, or, The Laws of the Imaginary Greeks: Nearly every problem in datacenter- or planet-scale computing boils down to these issues: how do you get a bunch of computers, often distant from one another, connected via unreliable links, and prone to going down at unpredictable intervals, to nonetheless agree on what information they store? Single data stores (the Pseudemoxian Hermit), where a single computer keeps its own copy, everyone wishing to use it must take turns, and the system is vulnerable to a single disaster; however, the system is strongly consistent, dead-simple, and all other systems are built on top of it; Eventually consistent replication (the Fotan system), where each participant has their own (strongly-consistent) store, and everyone changes and reads their own copy, distributing and receiving updates to all of their fellows later on; Quorum decisions (the Paxon system — and unlike the other examples, this one is actually called “Paxos” in normal CS conversations), where reads and writes involve getting a majority of the participants to agree; Master election (the Siranon system), where an expensive, strongly-consistent store is used to decide who is in charge of any subject for a time, and then that responsible party uses their own, smaller, strongly-consistent store to maintain the laws on that subject.

Apple has what looks to be a good course for developers. Everyone Can Code.

More good news for cities and more bad news for small towns. Cities aren't just highly efficient machines for producing innovation, they are also more resilient to the threats of automation. Expect the class wars to intensify. Automation will have a bigger impact on jobs in smaller cities: the types of jobs that are hardest to automate become increasingly prevalent in larger cities. For example, the job of a checkout assistant is relatively easy to automate, and so regardless of a city’s population you would expect the proportion of residents there with that job to remain the same. But the proportion of people with jobs that rely on analytical, management and organisational skills, such as computer scientists or chemists, increases with city size. Once a city becomes large enough, it can support more technical jobs than smaller cities.

A recap of Facebook's Networking @Scale 2017. You might like: A Close Look at Alibaba's High Performance Packet Processing Platform and Automating and Scaling the Edge and Delivering Terabits of Content: External Considerations [Netflix].

You're only as available as the sum of your dependencies. Deep insight here. The Calculus of Service Availability: A service cannot be more available than the intersection of all its critical dependencies. If your service aims to offer 99.99 percent availability, then all of your critical dependencies must be significantly more than 99.99 percent available...This is called the "rule of the extra 9 [at Google]...If you have a critical dependency that does not offer enough 9s (a relatively common challenge!), you must employ mitigation to increase the effective availability of your dependency (e.g., via a capacity cache, failing open, graceful degradation in the face of errors, and so on.)...A service cannot be more available than its incident frequency multiplied by its detection and recovery time...If a service has N unique critical dependencies, then each one contributes 1/N to the dependency-induced unavailability of the top-level service, regardless of its depth in the dependency hierarchy...any critical component must be 10 times as reliable as the overall system's target, so that its contribution to system unreliability is noise.

Fast to market is not always the right mantra. Backblaze took the time to build their own technology and managed to create something new in the process. Building a Competitive Moat: Turning Challenges Into Advantages. We started Backblaze thinking of ourselves as a backup company. In reality, we became a storage company with ‘backup’ as the first service we offered on our storage platform...It didn’t just change how we built the service, it changed the fundamental DNA of the company.

Silicon Valley ripping plots straight from the headlines. The distributed internet is a thing. Introducing the Blockstack Browser: A Gateway to a New, Decentralized Internet: muneeb: We're not reinventing anything at/below TCP/IP. That stack works fairly well and can function in a decentralized way. We are replacing things above TCP/IP like DNS, Certificate Authorities (CAs), how data is discovered, data silos, and dependence on remote servers for running your apps. muneeb: It's an open-source project with 4 years of research and development behind it. The architecture implemented here is actually inspired by David Clark (Chief Protocol Architect of the internet) and his new design principle, called trust-to-trust design, that aims to fix critical security issues with the current design of the internet. muneeb: We already have enough money and are grateful for having investors like Union Square Ventures, Naval Ravikant, Y Combinator, SV Angel, Lux Capital, and others who share our vision. Building a truly decentralized internet will take years/decades and we're looking forward to growing our open-source community that can take on this grand challenge.

Roboschool is open-source software for robot simulation.

Before you endure the complexity of microservices there are easier, cheaper steps you can take. Enough with the microservices: If you’re in a growth-stage startup with the need to make some changes to your architecture and microservices aren’t the answer they seem to be, what is it that you should be doing? Clean up the application. Refactor the application into clear modules with clear APIs. Choose one module in the application and split it into its own application on the same host. Take the separated module and put it on a different host system. If possible, refactor the data storage system so that the module on the other host now has total responsibility for storage of data within its context.

The cost of resources dictates your architecture. History of storage costs and the software design impact: the cost of a single 571MB hard drive was as much as two full time developers, for an entire year [1979]...At the time of this writing, you can get a hard disk with 10 TB of storage for about $400. And a 1 TB SSD drive will cost you less than $300...The really interesting aspect of those numbers is the way they shaped the software written at that time period. It made a lot of sense to put a lot more on the user, not because you were lazy, but because it was the only way to do things. Most document databases, for example, are storing the document structure alongside the document itself (so property names are stored in each document. It would be utterly insane to try to do that in a system where hard disk space was so expensive. On the other hand, decisions such as “normalization is critical” were mostly driven by the necessity to reduce storage costs, and only transitioned later on to the “purity of data model” reasoning once the disk space cost became a non issue.

If you are looking for a cloud alternative, Hetzner, a low cost hosting service, is offering a private cloud based on Open Stack. Good discussion on HackerNews. Many report good experiences with Hetzner, many don't. One concern is Hetzner is not administrating Open Stack, that's up to you.

Batching for the win...again. Khan Academy on Memcached-Backed Content Infrastructure. KA changed to serving all their content from Memcache (because it's supported by Google App Engine). Individual gets proved to be slow so they batched gets for better performance. Of more interest is their iterative process of making changes, testing on live traffic, and using a feature flag to switch off the new code when there was a problem. They did find problems with overloaded memcache servers which caused slow replies which caused even more requests to be sent by the front-end.

Steve Gibson on talking about the 44th Anniversary of the Invention of Ethernet makes an interesting point, that like the Internet, which used lossy destination based forwarding, Ethernet was also a technology that was less reliable than the technologies it replaced. We also see that in databases with eventual consistency.

googlecreativelab/quickdraw-dataset: The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game Quick, Draw!. The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located.

yahoo/daytona: An application-agnostic framework for automated performance testing and analysis.

stelligent/mu (article): A tool for managing your microservices platform. Roboschool provides new OpenAI Gym environments for controlling robots in simulation. Roboschool also makes it easy to train multiple agents together in the same environment.

raviqqe/tisp: a "Time is Space" programming language. Tisp is a functional programming language with implicit parallelism and concurrency. It aims to be simple, canonical, and practical. Every program in Tisp can run parallelly and concurrently with nothing special!

redox-os/tfs: a modular, fast, and feature rich next-gen file system, employing modern techniques for high performance, high space efficiency, and high scalability. TFS was created out of the need for a modern file system for Redox OS, as a replacement for ZFS, which proved to be slow to implement because of its monolithic design.

Microsoft's P: A programming language designed for asynchrony, fault-tolerance and uncertainty: The P programmer writes the protocol and its specification at a high level. The P compiler provides automated testing for concurrency-related race conditions and executable code for running the protocol. P provides first-class support for modeling concurrency, specifying safety and liveness properties and checking that the program satisfies its specification using systematic search. In these capabilities, it is similar to Leslie Lamport’s TLA+ and Gerard Holzmann’s SPIN.

Proofs aren't proof against bugs. An Empirical Study on the Correctness of Formally Verified Distributed Systems: This paper thoroughly analyzes three state-of-the-art, formally verified implementations of distributed systems: IronFleet, Verdi, and Chapar. Through code review and testing, we found a total of 16 bugs, many of which produce serious consequences, including crashing servers, returning incorrect results to clients, and invalidating verification guarantees. These bugs were caused by violations of a wide-range of assumptions on which the verified components relied.

Stuff The Internet Says On Scalability For May 26th, 2017

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale