hot links

Stuff The Internet Says On Scalability For April 14th, 2017

High Scalability

14 Apr 2017 — 17 min read

Hey, it's HighScalability time:

After 20 years, Cassini will not go gently into that good night, it will burn and rave at close of day. (nasa)
If you like this sort of Stuff then please support me on Patreon.

10^15: synapses activated per second in human brain (2/3rds fail); $4.5B: Amazon spend on video (Netflix $6 billion); 22,000: AWS database migrations served; ~15%: Dropbox reduced CPU usage using Brotli; $3.5 trillion: IT spending in 2017; 10%: reduction in QoQ hard drive shipments; 33.3%: Nginx share of webserver market; 37.2 trillion: human cells in a Cell Atlas; 6.2 miles: journey to the center of the earth; 200: lines of code for blockchain; 95%: Wikipedia pages end up at philosophy; 1.2 billion: Messenger monthly users;

Quotable Quotes:
- Jeff Bezos: Day 2 is stasis. Followed by irrelevance. Followed by excruciating, painful decline. Followed by death. And that is why it is always Day 1.
- Bob Schmidt: If debugging is the process of removing errors from a design, then designing must be the process of putting errors into a design!
- @swardley: the gap between where the cutting edge is and where the majority are just seems to increase year on year.
- Riot Games: We need to provide resources when it's time to grow, we need to react when it gets sick, and we need to do it all as fast as possible at a global scale.
- masklinn: High-performance native code already does these specialisation, generally on a per-project basis (some projects include multiple allocators for different bits of data), and possibly using a non-OS allocator in the first place
- @erikbryn: MT: @DKThomp : there are 950k warehouse workers —6X the number of steel workers and miners combined
- Joeri: The challenge of a rewrite is not in mapping the core architecture and core use case, it's mapping all the edge cases and covering all the end user needs. You need people intimately familiar with the old system to make sure the new system does all the weird stuff which nobody understood but had good reasons that was in the corners of the old system's code.
- @redblobgames: 2016 GDC Diablo talk: let's switch from turn-based to real-time 2017 GDC Civilization talk: let's switch from real-time to turn-based
- @random_walker: Encrypted traffic has a fingerprint—enough to distinguish among 200 Netflix vids with 99.5% accuracy in < 2.5 mins.
- Sophie Wilson: You’re going to buy a 10-way, 18-way multi-core processor that’s the latest, all because we told you you could buy it and made it available, and we’re going to turn some of those processors off most of the time. So you’re going to pay for logic and we’re going to turn it off so you can’t use it.
- qq66: But is there anything more personal than a computer programmer writing a bot to send messages for him?
- Anu Hariharan: Unlike other social products, WeChat does not only measure growth by number of users or messages sent. Instead they also focus on measuring how deeply is the product engaged in every aspect of daily life (e.g., the number of tasks WeChat can help with in a day).
- @fredwilson: "The real issue here is Facebook’s market power. And we face similar market power issues in search (Google) and commerce (Amazon)"
- @ethanschoonover: "Serverless" still feels to me like a restaurant saying they are "kitchenless" so they can focus on food instead of food preparation.
- @kelseyhightower: You know you're building a PaaS when you start stitching together 1,000,000,000 other tools in order to get one click deployments.
- Lukasz: So in a nutshell with some lateral thinking we were able to build a cloud-native service which meets the customer need for around $50 per year versus roughly $75,000 per year. So tell me again how cloud is just the same old thing in a different guise?
- Ant Stanley: The reality is that FaaS platforms are not changing the way applications are written, they are simply platforms that are more suited to the way applications will be written.
- refurb: It's not to say ML has no value, but predicting molecular behavior, even in the simplest system is really dam hard.
- beat: When I see people in the startup world rolling their eyes at the "incompetence" of the enterprise world, I take it to mean they've never actually worked on a truly hard problem in their lives.
- @swardley: hmmm ... is this going to be a theme? The danger of someone who knows how to play centre of gravity games joining Amazon?
- @xaprb: Complexity is defined by the problem, not the solution. You can’t make a problem less complex. You can avoid a nightmarish solution, though.
- AsyncAwait: Most of the coverage focuses on gaming, where Intel wins hands down, because of the better single-core performance - The Ryzen i7 is really aimed at content creators who export a lot of photos or 4K video and developers who do a lot of compilation, basically tasks that require and benefit from multiple cores
- Jeff Bezos: But much of what we do with machine learning happens beneath the surface. Machine learning drives our algorithms for demand forecasting, product search ranking, product and deals recommendations, merchandising placements, fraud detection, translations, and much more. Though less visible, much of the impact of machine learning will be of this type — quietly but meaningfully improving core operations.
- @xaprb: Linear scalability is a phenomenon that occurs when queuing and crosstalk inefficiencies are eliminated by the coefficient of marketing.
- @amontalenti: Do fully remote teams scale? - 500+ headcount scale: @Automattic, @Github - 300+ headcount scale: @Elastic, @InVisionApp What do you think?
- Sex as an Algorithm: There is a mismatch between heuristics and evolution. Heuristics should strive to create populations that contain outstanding individuals. Evolution under sex seems to excel at something markedly different: creating a "good population."

Luna Duclos on Game Development and Rebuilding Microservices. Switching from PHP/Python to Go. Go is much faster and uses less CPU. As big as the switch to Go is the switch from Google App Engine to VMs. GAE servers are small and CPU constrained despite the relatively high cost. Their Go cluster runs in the Google Cloud on Google Container Engine.

Werner Against the Machine. Wait, aren't you the machine now?

Kwabena Boahe on Stanford Seminar: Neuromorphic Chips: Addressing the Nanostransistor Challenge. A dollar bought more and more transistors until 2014, when for the first time the price for transistors went up. Fundamental constraints at the physical level is the cause. The challenge is to continually shrink the footprint of the transistor so it occupies less space. A traffic metaphor is used to explain the difficulty of continually shrinking transistors. Shrinking gives you fewer lanes and electrons can block a lane by being trapped in a pothole. When you get down to one lane and electron is trapped the current flows slowly. Our brains work with ultimately scaled devices, they are called ion channels which can pass a single layer of ions. At 7:52 really cool simulation the bilipid layer that is the membrane of neuron. You can see sodium and potassium ions flow single file through a channel. The whole thing vibrates like someone experiencing meth withdrawal. At 14:20 is a cool visual of spike traveling down an axon. In the brain computation is analog and communication is digital. It takes 20 femtojoules per synapse activated, which is on average activating 20 ion channels. How does the brain do this? That's what Neuromorphic engineers are trying to figure out. Lots of technical stuff follows about building neuromorphic chips. Summary: we are on the cusp of realizing the full promise of combining analog computation with digital communication, as the brain does; coordinated-spiking networks offer linear scaling of precision with pool size, promising greater energy-efficiency than analog or digital computing. This technology is not going to be used to balance your checkbook. Digital computing will always be more precise. A lot applications are at the intermediate range of precision while needing the most energy efficient approach. 64 bits of precision is not why we can't solve self driving cars or navigating robots. Those kind of problems are where we are dealing with a lot of data but none of it is particularly important and we want do some action that doesn't have to be super precise it just has to be the right one. The domain is taking continuous real-time information that's changing over time. You are closing the loop so you can see what you did and account for it. Doesn't require a ton of precision, but it needs to be real-time. In video there's a lot redundancy from one frame to another, you shouldn't treat each frame as an individual image as we do now. You can get rid of all that redundancy if you have something that understands time and is keeping some state and feeding back. Everything you want to make autonomous has to deal with these kind of problems, some sensory, some cognitive, some motor.

In the US the tax filing deadline is soon approaching. If you need tax or accounting advice please try probot.us. Probot connects you to real live tax and accounting experts who can answer all your questions. Risk free! BTW, this isn't a paid advertisement because it's something I wrote.

Sometimes you actually do run into scaling problems, especially when you are the hot node in a new federated social network. Scaling Mastodon: providing a smooth and swift service to 43,000 users takes some doing...Mastodon includes a variety of optimizations that at least doubled the throughput of requests and background jobs compared to the first day of going viral...mastodon.social is servicing about 6,000 open connections, with about 3,000 RPM and an average response time of 200ms...2x baremetal C2M (8 cores,16GB RAM) servers...6x baremetal C2S (4 cores, 8GB RAM) servers...MRI Ruby does not have native threads, so they cannot be run truly in parallel, no matter how many CPUs you have...split queues between different Sidekiq processes on different machines...more workers with less threads work faster than less workers with more threads.

Algorithms will determine lots of things in the future, like who gets kicked off a plane. It's a matter of priority. United passenger threatened with handcuffs to make room for 'higher-priority' traveler. I'm sure every programmer understands how unbiased any such a system will be.

While Facebook puts a lot of work into making their properties work over slower mobile networks, Twitter has gone a different route by releasing a specialized app. Twitter Lite minimizes data usage, loads quickly on slower connection, and is resilient on unreliable mobile networks. Result: Twitter Lite an order of magnitude less expensive to run than our server-rendered desktop website. How we built Twitter Lite is instructive and while not simple, it's a good example of how the web can compete with native apps. Twitter Lite is a Progressive Web App built using React, Redux, Normalizr, Globalize, Babel, Webpack, Jest, WebdriverIO, and Yarn. It's compiled with Babel and bundled with Webpack. API response data is first processed by Normalizr to de-duplicate items and transform data into more efficient forms. Speed and reliability are achieved through a series of incremental performance improvements known as the PRPL pattern and by using Service Worker, IndexedDB, Web App Install Banners, and Web Push Notifications. A small, simple Node.js server handles user authentication, constructs the initial state of the app, and renders the initial HTML application shell. Once loaded in the browser, the app requests data directly from the Twitter API.

I've always felt coming up with a scientific way to develop software is like developing a scientific way to paint. Sure, an AI can create paintings in the style of Vincent van Gogh, but they won't create new styles, and in many ways each software project requires its own style. The Problem with Today’s Software Thought Leaders with a good discussion on reddit.

Spanner vs. Calvin: distributed consistency at scale: I’m biased in favor of Calvin, but in going through this exercise, I found it very difficult to find cases where an ideal implementation of Spanner theoretically outperforms an ideal implementation of Calvin. The only place where I could find that Spanner has a clear performance advantage over Calvin is for latency of read-only transactions submitted by clients that are physically close to the location of the leader servers for the partitions accessed by that transaction. Good discussion on HN.

tdammers with a great answer to How NOT to design Netflix in your 45-minute System Design Interview? So, TL;DR: when you are asked "How would you design X", then the question is explicitly not "How would you implement X". Scalable design isn't really about questions like "should we use MongoDB or MySQL", or "which programming language should we use". It is about breaking up a system into components, identifying scalability bottlenecks, SPOFs, etc., and figuring out strategies to disarm them...In my experience, another important question to ask is what degree of scalability you will actually need. Are you processing thousands, millions, or billions of requests per second?...And even in super high traffic systems, there is usually only a small part of the overall system that requires special treatment; for the rest of the system, a generic approach like 12-factor is enough to keep things scalable enough to support the critical stuff and then some.

Yes, there is a Google strategy tax. Google ruins the Assistant’s shopping list, turns it into a big Google Express ad. This never works. You have to make something good for people to adopt it, not making something they like worse.

“Young people are just smarter,” said a 22 year old Mark Zuckerberg. I wonder if the 32 year old Mark still agrees? How about his 94 year old self? To Be a Genius, Think Like a 94-Year-Old: A study of Nobel physics laureates found that, since the 1980s, they have made their discoveries, on average, at age 50. So suck it ageists.

In the same way we don't really know what's in an ASIC we don't really know what's in an AI. The Dark Secret at the Heart of AI: You can’t just look inside a deep neural network to see how it works. A network’s reasoning is embedded in the behavior of thousands of simulated neurons, arranged into dozens or even hundreds of intricately interconnected layers...at some stage we may have to simply trust AI’s judgment or do without using it.

Algolia is blisteringly fast, here's How Algolia Reduces Latency For 21B Searches Per Month. Design goal: aggressively reduce latency. Stats: Query volume: 1B/day peak, 750M/day average (13K/s during peak hours); 800+ API servers; 64TB of RAM; 15+ regions; 47+ datacenters. Stack: written in C++ and runs inside of nginx; clients connect directly to the nginx host where the search happens; runs on hand-picked bare metal; uses a hybrid-tenancy model; doesn’t use AWS or any cloud-based hosting for the API.

The nice thing about benchmark wars is that the only weapons fired are more benchmarks. Nvidia claims Pascal GPUs would challenge Google’s TensorFlow TPU in updated benchmarks: It would have been interesting to see how Google’s TPU matched up against Nvidia’s newest and most powerful Pascal architecture, but I strongly suspect that it wouldn’t tell us much about which kind of solutions vendors are likely to use.

To survive an attack you need better defenses, that's what StackOverflow is building in a clever way. Introducing DnsControl – “DNS as Code” has Arrived (video)~ the DNS DSL and compiler that treats DNS as code, with all the DevOps-y benefits of CI/CD, unit testing, and more. The dnscontrol language specifies domains at a high level and leaves the actual manipulations and updates to automation. Massive changes, such as failovers between datacenters, are now a matter of changing a variable and recompiling. Dnscontrol is extendable and has plug-ins for BIND, CloudFlare, Route53/AWS, Azure, Google Cloud DNS, Name.Com, and more.

mattaugamer with a good answer on Feature toggle vs feature branches: These aren't competing approaches. They're solutions to different things. Feature branches are a way of managing the codebase. They allow a development team to do work for a specific feature in a specific branch. That means a complete feature can be merged in a single commit, etc. Feature toggles are about the surfacing of features. They let you set flags that enable or disable functionality. This may be because the feature isn't ready, but it an also be more dynamic, for example, enabling A-B testing of the effectiveness of a feature. They're both useful techniques, and work together well.

AWS vs Digital Ocean - A Performance Comparison: I'll be honest, I'm a little surprised at how one-sided the results are. I thought Digital Ocean's SSDs would give them a significant advantage, but it doesn't appear they have...In many of the metrics, AWS had only a slight edge over Digital Ocean, but when it wasn't close, it wasn't close at all. This is most apparent in the HTTPS tests...AWS's response times and transaction lengths were much more consistent than Digital Ocean's...AWS, hands-down, outperformed Digital Ocean when handling requests over HTTPS.

More great details on Running Online Services at Riot: Part III: Part Deux: With OpenContrail providing an API to configure our networking, we now have the opportunity to automate our application’s networking needs...To enable this workflow, we built a system to describe the network features of an application in a simple JSON data model that we call a network blueprint...Defense in depth means that we enforce our security policies at multiple points in our infrastructure...to meet the ever-increasing needs of our applications, we combine DNS, Equal Cost Multi-Pathing (ECMP), and a traditional TCP-based load balancer such as HAProxy or NGINX to provide a feature-rich and highly available load balancing solution...we built a scaled down version of our data centers in a staging environment...In addition to our basic checks, we also have more complicated and disruptive testing that breaks important components and forces the system to run in a degraded state.

Being both impressed and horrified by the human race seems like a good default. Prison inmates built working PCs out of ewaste, networked them, and hid them in a closet ceiling: prisoners used the PCs for a number of activities, including several criminal acts like identity theft and credit-card fraud. They were able to network their PC by using a guard's password; the use of this account on days when the guard wasn't on-shift tipped off the prison's systems administrators that something was awry

The cloud is entering its scripting phase. Netflix Conductor: Inversion of Control for workflows: The idea is to be able to chain workflow actions such as starting a workflow, completing a task in a workflow, etc. based on state changes in other workflows.

Dropbox found moving to Brotli compression was not a major web performance win. Deploying Brotli for static content. Reasons: Client-side caching works; CDNs are fast; Users of our website are mostly coming from stable internet links; Not all browsers support Brotli; Modern websites are very Javascript-heavy.

Maybe the hidden rules against self modifying code is just a quaint custom? Octopuses and squids can rewrite their RNA. Is that why they’re so smart?: Other organisms use all sorts of different methods to modify their RNA, but the possibility that coleoids use extensive RNA editing to flexibly manipulate their nervous system is “extraordinary"

Here's how Figma dealt with a spam attack. An Alternative Approach to Rate Limiting. Excellent explanations of some options with great diagrams: Token bucket, Fixed window counters, Sliding window log. Their choice: fixed window counters and sliding window log inspired the algorithm that stopped the spammers. We count requests from each sender using multiple fixed time windows 1/60th the size of our rate limit’s time window...our rate limiter was accurate down to the second and significantly minimized memory usage...our attackers saw the response code change from 200 to 429 and simply created new accounts to circumvent the rate limiting on their blocked accounts. In response, we implemented a shadow ban: On the surface, the attackers continued to receive a 200 HTTP response code, but behind the scenes we simply stopped sending document invitations after they exceeded the rate limit...

150x Speedup in Real-Time Dashboards with Postgres 9. A reference implementation of how rollups can be implemented in a sane way: add triggers (upon INSERT of row in the events table) to queue up count data to an intermediate table (called rolledup_events_queue ); At a specified frequency (with some jitter so as to avoid the thundering herd problem), we use Postgres 9.5's UPSERT feature to take all of the pending counts in the rolledup_events_queue table and roll them up (by hour) in the rolledup_events table; The Query Engine then queries from the rolledup_events table.

Looks like this was a good talk. DevOOPS: Attacks And Defenses For DevOps Toolchains. 151 slides with practical advice on securing Redis, Hudson, IAM, etc.. Food for thought: Developers possibly have the keys to the whole kingdom on their laptop. Protect and monitor those assets

Putting features in the database never ends well. The database always bottlenecks the system then you are stuck. Databases have failed the web: Why don't modern databases simply provide these features? I don't know. I can guess, but I wont be charitable. Because its hard.

Yubl’s road to Serverless — Part 3, Ops. Nice approach to logging: All of our Lambda functions are created with wrappers that wraps your handler code with additional goodness such as capturing the correlation IDs into a global.CONTEXT object. And metrics: use special log messages and process them after the fact. And config: As most of our Lambda functions need to talk to the config API we invested efforts into making our client library really robust and baked in caching support and periodic polling to refresh config values from the source.

Lots of lovely Lessons Learned in Lambda: set up CloudWatch alerts to notify you when an exception occurs; Lambda limits the size of a deployment package to 50 MB; Lambda functions limit how much data you can pass into them, we ended up moving calculated results into Redis then we refactored our Lambda functions to accept Redis keys as arguments; We develop on macOS machines at Collective Idea, and this caused trouble when deploying any dependencies that are not written in pure Python; the limit of a running Lambda function is five minutes, we grouped atomic calculations into batches of around 100, and ran each batch in their own thread, that extended the runtime of each Lambda function to around seven seconds, but greatly reduced the number of concurrently running Lambda functions by a factor of 100.

An epic (and scary) post on Over The Air: Exploiting Broadcom’s Wi-Fi Stack (Part 2): we’ll explore two distinct avenues for attacking the host operating system. In the first part, we’ll discover and exploit vulnerabilities in the communication protocols between the Wi-Fi firmware and the host, resulting in code execution within the kernel. Along the way, we’ll also observe a curious vulnerability which persisted until quite recently, using which attackers were able to directly attack the internal communication protocols without having to exploit the Wi-Fi SoC in the first place! In the second part, we’ll explore hardware design choices allowing the Wi-Fi SoC in its current configuration to fully control the host without requiring a vulnerability in the first place.

In the year 9999 serialization will still be slowing down systems. Saving Millions by Dumping Java Serialization.

Neural Engineering Framework: a method used for constructing neural simulations.

hybridgroup/gobot: Golang framework for robotics, drones, and the Internet of Things (IoT). It provides a simple, yet powerful way to create solutions that incorporate multiple, different hardware devices at the same time.

excamera/AWSLambdaFace: Build an elastic (i.e. horizontally scalable) system that uses deep convolutional neural networks to perform face detection and recognition in the cloud.

ReactXP: If you write your app to this abstraction, you can share your view definitions, styles and animations across multiple target platforms. Of course, you can still provide platform-specific UI variants, but this can be done selectively where desired.

A neuromorph's prospectus: As transistors shrink to nanoscale dimensions, trapped electrons - blocking "lanes" of electron traffic - are making it difficult for digital computers to work. In stark contrast, the brain works fine with single-lane nanoscale devices that are intermittently blocked (ion channels). Conjecturing that it achieves error-tolerance by combining analog dendritic computation with digital axonal communication, neuromorphic engineers (neuromorphs) began emulating dendrites with subthreshold analog circuits and axons with asynchronous digital circuits in the mid-1980s. Three decades in, researchers achieved a consequential scale with Neurogrid - the first neuromorphic system that has billions of synaptic connections.

More proof humans suck at satire. This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News: Fake news in most cases is more similar to satire than to real news, leading us to conclude that persuasion in fake news is achieved through heuristics rather than the strength of arguments. We show overall title structure and the use of proper nouns in titles are very significant in differentiating fake from real. This leads us to conclude that fake news is targeted for audiences who are not likely to read beyond titles and is aimed at creating mental associations between entities and claims.

Stuff The Internet Says On Scalability For April 14th, 2017

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale