hot links

Stuff The Internet Says On Scalability For July 7th, 2017

High Scalability

07 Jul 2017 — 29 min read

Hey, it's HighScalability time:

What's real these days? I was at Lascaux II, an exact replica of Lascaux. I was deeply, deeply moved. Was this an authentic experience? A question we'll ask often in VR I think.
If you like this sort of Stuff then please support me on Patreon.

$400k: cost of yearly fake news campaign; $50,000: cost to discredit a journalist; 100 Gbps: SSDP DDoS amplification attack; $5.97BN: wild guess on cost of running Facebook on AWS; 2 billion: Facebook users; 80%: Spotify backend services in production run as containers; $60B: AR market by 2021; 10.4%: AMD market share taken from Intel; 5 days: MIT drone flight time; $1 trillion: Apple iOS revenues; 35%-144%: reduction in image sizes; 10 petabytes: Ancestry.com data stored; 1 trillion: photos taken on iPhone each year; $70B: Apple App Store payout to developers; 355: pages in Internet Trends 2017 report; 14: people needed to make 500,000 tons of steel; 25%: reduced server-rendering time with Node 8; 50-70%: of messages Gmail receives are spam; 8,000: bugs found in pacemaker code;

Quotable Quotes:
- Vladimir Putin: We must take into account the plans and directions of development of the armed forces of other countries… Our responses must be based on intellectual superiority, they will be asymmetric, and less expensive.
- @swardley: What most fail to realise is that the Chinese corporate corpus has devoured western business thinking and gone beyond it.
- @discostu105: I am a 10X developer. Everything I do takes ten times as long as I thought.
- DINKDINK: You grossly underestimate the hashing capacity of the bitcoin network. The hashing capacity, at time of posting, is approximately 5,000,000,000 Gigahashes/second[1]. Spot measurement of the hashing capacity of an EC2 instance is 0.4 Gigahashes/second[2]. You would need 12 BILLION EC2 instances to 51% attack the bitcoin network.[3] Using EC2 to attack the network is impractical and inefficient.
- danielsamuels && 19eightyfour~ Machiavelli's Guide to PaaS: Keep your friends close, and your competitors hosted.
- Paul Buchheit: I wrote the the first version of Gmail in a day!
- @herminghaus: If you don’t care about latency, ship a 20ft intermodal container full of 32GB micro-SD cards across the globe. It’s a terabyte per second.
- @cstross: Okay, so now the Russian defense industry is advertising war-in-a-can (multimodal freight containerized missiles):
- Dennett~ you don't need comprehension to achieve competence.
- @michellebrush~ Schema are APIs. @gwenshap #qconnyc
- Stacy Mitchell: Amazon sells more clothing, electronics, toys, and books than any other company. Last year, Amazon captured nearly $1 of every $2 Americans spent online. As recently as 2015, most people looking to buy something online started at a search engine. Today, a majority go straight to Amazon.
- Xcelerate: I have noticed that Azure does have a few powerful features that AWS and GCP lack, most notably InfiniBand (fast interconnects), which I have needed on more than one occasion for HPC tasks. In fact, 4x16 core instances on Azure are currently faster at performing molecular dynamics simulations than 1x"64 core" instance on GCP. But the cost is extremely high, and I still haven't found a good cloud platform for short, high intensity HPC tasks.
- jjeaff: I took about 5 sites from a $50 a month shared cPanel plan that included a few WordPress blogs and some custom sites and put them on a $3 a month scaleway instance and haven't had a bit of trouble.
- @discordianfish: GCP's Pub/Sub is really priced by GB? And 10GB/free/month? What's the catch?
- Amazon: This moves beyond the current paradigm of typing search keywords in a box and navigating a website. Instead, discovery should be like talking with a friend who knows you, knows what you like, works with you at every step, and anticipates your needs. This is a vision where intelligence is everywhere. Every interaction should reflect who you are and what you like, and help you find what other people like you have already discovered.
- @CloudifySource: Lambda is always 100% busy - @adrianco #awasummit #telaviv #serverless
- @codinghorror: Funny how Android sites have internalized this "only multi core scores now matter" narrative with 1/2 the CPU speed of iOS hardware
- @sheeshee: deleted all home directories because no separation of "dev" & "production". almost ran a billion euro site into the ground with a bad loop.
- @sbellware: 95% of the cost of event sourcing projects is explaining event sourcing
- jboggan: Area 120 is awkwardly addressing the fact that 20% time is dying and isn't working as the R&D model that once drove so much innovation at Google. The name itself is an in-joke about the tendency for 20% time to only come after you've done the first 100%.
- @jboner: "Two-phase commit is the anti-availability protocol." —Pat Helland
- Geoffrey West~ One reason for constant urban growth is that the bigger the city, the more efficient it is, because of economies of scale.
- @xaprb: “we serve 55 million requests a day” —> about 600 per second. Not impressive.
- Stacy Mitchell: Jeff Bezos's big bet is that he can make buying from Amazon so effortless that we won't notice the company's creeping grip on commerce and its underlying infrastructure, and that we won't notice what that dominance costs us.
- @JoeEmison: I have found that 95% of the time when lead dev says, "we need to rebuild this back end/DB", it's just not justified. 6/
- jzelinskie: The real question isn't "can FB be hosted on AWS?", it's "why isn't FB competing with AWS?" because what they've already got is much better for the range of applications that they deploy.
- David Rudin: Apps encourage us not to trust ourselves, but to think of ourselves as a component of the machine. These tools simplify our lives on the condition that we simplify ourselves for them
- @ShortJared: People reacted to stats... here's some numbers on APIG/Lambda vs Kinesis, will share spreadsheet later. TLDR; Kinesis is cheaper.
- JessiTRON: Let's acknowledge that there really are developer+situations that are 10x more productive than others. Let's acknowledge that they don't scale. Make choices about when we can take advantage of the sweet spot of local or individual automation, and when the software we're building is too important for a bus factor of one.
- @DanielMorsing: I'm naming my cats-to-be kafka, cassandra and pagerduty so that I will be less surprised when they wake me up in the middle of the night
- @sebdeckers: HTTP/2 protocol overhead benchmark: Node.js 50% faster than Nginx
- rebootthesystem: I am always surprised to see how many software developers I come across who have no experience whatsoever applying FSM's in the course of their work. I can unequivocally say that FSM's have made me tons of money. By that I mean that they have simplified projects, made them far more robust, easily extensible, simple to modify and, in general terms, quicker to develop with less bugs and errors. I've used FSM's in FPGA's (for example, optimized DDR memory controllers), embedded projects, robotics and system software. In other words, everywhere.
- Terry Crowley: Execution matters. There is no innovation without execution.
- @swardley: somehow CloudFoundry got sucked into containers, seemed to focus on building private PaaS and took its eye of the ball.
- majke: I really think that DDoS is a threat to the internet as we know it. Think about centralization that it causes: can your server sustain trivial 100Gbps SSDP attack? I really think that doing netflow right will allow us to keep the decentralized internet.
- @chrismaddern: Fun fact: AirDrop is the preferred way for young people to share in Cuba - it doesn’t rely on internet which is expensive and sparse.
- rusanu: We've been doing [Exactly Once In Order messaging] in 2005 at +10k msgs/sec (1k payload), durable, transacted, fully encrypted, with no two phase commit, supporting long disconnects (I know for documented cases conversations that resumed and continued after +40 days of partner network disconnect).
- @CodeWisdom: "We build our computer (systems) the way we build our cities: over time, without a plan, on top of ruins." - Ellen Ullman
- StevePerkins: Back in the 1980's, the mailbox was my Internet. I subscribed to SO many magazines. Back in those days, a lot of the advertisements had notes like, "For more information, send a postcard to this address". As much as I dislike ads now, back then I would actually send those postcards, out of sheer boredom.
- gimpwiz: My favorite scaling strategy is: "By the time we start thinking we need to scale, we'll be making enough money to hire a small team of experts."
- matt4077: Speaking to your argument about accountability: Do you think the CEO of GE is really more afraid of the shareholders than the mayor of some city in Kansas? Politicians are plenty accountable. And for something like the federal government, projects are managed by employees, who are managed by agency heads. The latter get a budget and have to work within it.
- @ben11kehoe: Idempotent operations are critical in event-driven systems without exactly-once guarantees like #Lambda
- erikpukinskis: Americans won't realize the benefits of cryptocurrencies until they start to be able to do capital-B Banking with it. Not deposits, withdrawals, and transfers, but the creation of new financial instruments.
- hammerheadtiger: Lots of people online are calling [HomePod] overpriced because they think Apple just slapped a bunch of speakers in a circular configuration and added Siri, but the engineering behind it is extremely audiophile niche stuff. And it does this all automatically with no acoustical set up or technical know how.
- @GossiTheDog: I know a bunch of admins at web hosts. They're all least cost, stack high, disc based backups with little security. Prime targets.
- solatic: That's when you understand that what really needs to be talked about is a culture change in public work culture, and when you understand that, you understand just how high the mountain is that needs to be surmounted.
- @etherealmind: Amazon is not cheap. For metaphorically the 1120th time.
- @ctford: The CAP theorem says that a paper on distributed systems cannot be simultaneously Applicable, Comprehensible and free from Paywall.
- Murat: some single threaded implementations were found to be more than an order of magnitude faster than published results (at SOSP/OSDI!) for systems using hundreds of cores.
- Paul Buchheit: the biggest counter-intuitive lesson is that investing purely on the quality of the idea is inversely correlated with success. The really great companies are largely indistinguishable from the terrible ones when judged simply on the idea. To be good at it you have to meet the founders and assess how good of a founder they are.
- Nimblewill Nomad~ Shaving down one’s pack weight is a process of sloughing off one’s fears.
- @danielbryantuk: Sure I saw a recent tweet saying "every developer wants a PaaS, as long as they can build it themselves", but can't find it. Anyone help?
- TunaBoo: Perhaps the GOOD thing in your stack is kubernetes and not microsevices?
- riskable: I can't count the number of times we (various IT teams) have selected products because they were the only one that had a feature we needed. I'm at the point in my IT career where if I'm watching a demo I politely let them do their little spiel for 5-10 minutes and then I'm like, "yeah yeah show us feature X" (because I want to see that it works and how it works).
- @ben11kehoe: I think it's always *possible* to get there with any stack. Question is, what approach best allows a team of average skill to accomplish it?
- Paul Buchheit: The biggest [turn off when an entrepreneur pitches you] is made-up ideas. The best startups come from personal experience. It was something you or someone you know needed.
- siliconc0w: I wonder how much they could enlist others to solve this by creating something like an 'Uber Auction House' to basically buy and sell the right to reap Uber's cut for a ride. They could clean up on exchange fees while everyone solves this problem for them.
- Daniel C. Dennett: it is possible, and more likely, I think, that a rather inelegantly complicated, expensive, slow, Rube-Goldberg conglomeration of objets trouvés was the first real replicator, and after it got the replication ball rolling, this ungainly replicator was repeatedly simplified in competition with its kin.
- taylodl: It's not the end of the app economy, it's just the app economy is in the long tail mode now. Every obvious app that's going to have wide appeal has already been created. Now it's about filling in the niches.
- @BenedictEvans: 7.6bn people on earth / 5.5bn over 14 / 5bn mobile phones / 3bn smartphones / 700m iPhones, 1.8bn Google Androids, 500m Chinese Androids
- @strlen: tl;dr Exactly means that if a message is sent, it is delivered exactly once. It does not imply every message will always be sent w/o delay.
- @GarettJones: Bellman, inventor of dynamic programming, had to hide the fact he was inventing it from the Secretary of Defense
- @codinghorror: This part of Slack's early mission statement always made me laugh. Slack is the literal opposite of all these things.
- @aphyr: kinda disappointed that I have to go from $25/mo to $250/mo to get a quiet EC2 instance though; my EC2+ELB+r53 costs were ~40/mo for QA&Prod
- Steve Wozniak: When you design with very few parts, everything is so clean and orderly you can understand it more deeply in your head, and that causes you to have fewer bugs. You live and sleep with every little detail of the product.
- Chris Dixon: Token networks remove this friction by aligning network participants to work together toward a common goal— the growth of the network and the appreciation of the token.
- @sublimecoder: “One bad programmer can easily create two new jobs a year.” – David Parnas
- James Gleick: Enjoy the present. Don’t waste your brain cells agonizing about lost opportunities or worrying about what the future will bring. As I was working on the book I suddenly realized that that’s terrible advice. A potted plant lives in the now. The idea of the ‘long now’ embraces the past and the future and asks us to think about the whole stretch of time. That’s what I think time travel is good for. That’s what makes us human — the ability to live in the past and live in the future at the same time.
- @rob_carlson: That is, per GW costs for nuclear and coal have increased with installed capacity, not decreased as for wind and solar. Wrong slope.
- @BenedictEvans: "By 1980 CB Radio was almost dead; it collapsed under the weight of its own popularity. Could the Internet go the same way?" Economist, 1995
- @CodeWisdom: "Give someone a program, you frustrate them for a day; teach them how to program, you frustrate them for a lifetime." - David Leinweber
- @CTOAdvisor: Ironically, my wife is worried about my IT career because my home lab hardware costs have shrunk to near zero.
- EverSQL: The most popular database is MySQL, and not by far comes SQL Server. Almost half of the developers who answered the survey (44.3% out of 36,935 responders) are using MySQL. It seems RDBMS databases and specifically MySQL are not going anywhere anytime soon.
- Steve McDowall: BTW -- from actual experience -- InnoDB starts to falter when a single table with indexes grows beyond 1TB .. @ 2TB it's fairly unusable - insertion speed drops like a rock due to re-indexing. However, the TokuDB engine works very well for VERY VERY LARGE DB's .. We have one that is 40TB uncompressed with 1T rows in a single table and it still performs pretty well.

The Not Hotdog app on Silicon Valley may be a bit silly, but the story of how they built the real app is one of the best how-tos on building a machine learning app you'll ever read. How HBO’s Silicon Valley built “Not Hotdog” with mobile TensorFlow, Keras & React Native. The initial app was built in a weekend using Google Cloud Platform’s Vision API, and React Native. The final version took months of refinement. Google Cloud’s Vision API was dropped because its accuracy in recognizing hotdogs was only so-so; it was slow because of the network hit; it cost too much. They ended up using Keras, a deep learning library that provides nicer, easier-to-use abstractions on top of TensorFlow. They used on SqueezeNet due to its explicit positioning as a solution for embedded deep learning. SqueezeNet used only 1.25 million parameters which made training much faster and reduced resource usage on the device. What would they change? timanglade: Honestly I think the biggest gains would be to go back to a beefier, pre-trained architecture like Inception, and see if I can quantize it to a size that’s manageable, especially if paired with CoreML on device. You’d get the accuracy that comes from big models, but in a package that runs well on mobile. And this is really cool: The last production trick we used was to leverage CodePush and Apple’s relatively permissive terms of service, to live-inject new versions of our neural networks after submission to the app store.

And the winner is: all of us. Serverless Hosting Comparison: Lambda: Unicorn: $20,830.83. Heavy: $120.16. Medium: $4.55. Light: $0.00; Azure Functions: Unicorn: $19,993.60. Heavy: $115.40. Moderate: $3.60. Light: $0.00; Cloud Functions: Unicorn: $23,321.20. Heavy: $138.95. Moderate: $9.76. Light: $0.00; OpenWhisk: Unicorn: $21,243.20. Heavy: $120.70. Medium: $3.83. Light: $0.00; Fission.io: depends on the cost of running your managed Kubernetes cloud.

Minds are algorithms made physical. Seeds May Use Tiny “Brains” to Decide When to Germinate: The seed has two hormones: abscisic acid (ABA), which sends the signal to stay dormant, and gibberellin (GA), which initiates germination. The push and pull between those two hormones helps the seed determine just the right time to start growing...According to Ghose, some 3,000 to 4,000 cells make up the Arabidopsis seeds...It turned out that the hormones clustered in two sections of cells near the tip of the seed—a region the researchers propose make up the “brain.” The two clumps of cells produce the hormones which they send as signals between each other. When ABA, produced by one clump, is the dominate hormone in this decision center, the seed stays dormant. But as GA increases, the “brain” begins telling the seed it’s time to sprout...This splitting of the command center helps the seed make more accurate decisions.

You Are Not Google or Amazon or LinkedIn. Your problems aren't that big so you don't need to solve your problems like they do. Not a new idea, but nicely explained. To control premature scalation try UNPHAT: Don’t even start considering solutions until you Understand the problem; eNumerate multiple candidate solutions; read the Paper if there is one; Determine the Historical context; Weigh Advantages against disadvantages; Think! But then again the out-of-the-box scalable solution is often just as easy and better. Sharing is Caring: Multi-tenancy in Distributed Data Systems. Good discussion on reddit.

Mitigates one big objection--cold starts--to using Lambda. How long does AWS Lambda keep your idle functions around before a cold start?: functions are no longer recycled after 5 minutes of inactivity...some of my functions didn’t experience a cold start until after 30 minutes of idle time...it’s clear that AWS Lambda shuts down idle functions around the hour mark. It’s interesting to note that the function with 1536 MB memory is terminated over 10 mins earlier...idle functions with higher memory allocation will be terminated earlier...

Here's how the Navy is handling moving old software to new platforms. Cyber Boost: New Operating System Will Improve Navy Computing Power: Popcorn Linux, which can be used with any computer or device, and serves as a translation tool—taking generic coding language and translating it into multiple specialized program languages. From there, Popcorn Linux automatically figures out what pieces of the programming code are needed to perform particular tasks—and transfers these instruction “kernels” (the “popcorn” part) to the appropriate function.

When even your SSD based database isn't fast enough you still need a cache. Amazon DynamoDB Accelerator (DAX): Speed Up DynamoDB Response Times from Milliseconds to Microseconds without Application Rewrite.

For the adventurous. The SMACK Stack is the New LAMP Stack. That would be: Spark, Mesos, Akka, Cassandra, Kafka.

This Week in Computer Hardware 420 Intel's Core i9 Benchmarked. Skylake-X moved from a ring bus, which is a bidirectional, but linear communication method to all the cores, caches, memory, controllers, PCI Express, to a mesh interface, which lowers the average latency of memory, but it increases the maxium latency of memory. As a consequence the L3 cache is slower, which worsens the 1080p gaming performance. Still better than AMD Ryzen, but it's good to have competition again. Advice is to wait and see the performance of the AMD Ryzen Threadripper.

Amazon rips away its unlimited storage plan. Bait and switch? What happened to a deal is a deal? Isn't this just another version of WannaCry? Also, The Cost of Cloud Storage.

You've seen coal mines, maybe even a gold mine. What does a bitcoin mine look like? Mining miners are making a secret to the western region of the super-mine super-mine: The mine contains four large warehouses, each decorated with white and blue roofs. Each warehouse measured about 150 meters wide and about 20 meters wide, covering an area of about 3,000 square meters (32,000 square feet)...each warehouse takes 15 days to build, and then spend 10 days to deploy the miners.

All ECOOP 2017 papers are now available.

It's not a matter of picking a younger dog, it's a matter of picking the right breed for the job. You don't hire a Golden Retriever to do a Border Collie's job. Serverless and why I dislike RDBMS: one context where RDBMS does not have as much value is in a scenario where a lot of short-lived connections are made over and over again...FaaS (as has been put forward by most of the vendors) is exactly that type of scenario...developers of Serverless systems need to go back (forward?) and learn their data driven design...I still love RDBMS. But I love RDBMS like you love an old dog that can faithfully go on a walk with you, but no longer does the tricks that younger dogs can. Also, You Cannot Have Exactly-Once Delivery Redux. Also also, Akka Finite State Machine (FSM) and At Most Once Semantics. Also also also, Delivering Billions of Messages Exactly Once.

Lots of good experience reports. Ask HN: How was your experience with AWS Lambda in production? All sorts of opinions as you might expect for a newish technology. Too much immaturity in the development process, very powerful, huge potential.

For all young cultural archaeologists who would like to learn how a more primitive culture actually lived, OMNI Magazine has put all their back issues online (free for Kindle Unlimited). Some great art and and excellent sci-fi stories. Omni was Playboy for nerds, it changed a lot of lives. As a strategy point, this is another example of Amazon, with Kindle Unlimited, executing a bundling strategy, like Prime, that gets more and more valuable as the long tail is filled out with content. That's how you get a subscription income from commodity content.

Security is another example of bundling economics. Security vulnerabilities just keep accumulating over time and they never seem to go away. This is a boon to hackers.

7 AWS Lambda Tips from the Trenches: Keeping your instances warm; Upgrade to Node.js 6.10; Finish What You Started; Set Timeouts Shorter; Avoid Global State; Log Your Own Errors in Cloudwatch; Give It Some Room.

Good denormalization example. necrodome: Why do you need a user_id in todo_items if you already associate lists to users in todo_lists table? mslot: adding a user ID column to the items table primarily helps with distributing the data, since all of a user's lists and items can be placed on the same node based on the value in the user ID column, which enables transactions and efficient joins. From Scaling out complex SQL transactions in multi-tenant apps on Postgres.

A lot of product failures are from companies trying to diversify their brand. Introducing Sweden’s Museum of Failure. Who can forget Harley-Davidson's eau-du-toilette?

Into Scala? underscore.io open sourced their books.

You can use a message broker to glue systems together, but never use one to cut systems apart. How do you cut a monolith in half?: You cut a monolith with a protocol...A protocol is the rules and expectations of participants in a system, and how they are beholden to each other. A protocol defines who takes responsibility for failure...The problem with message brokers, and queues, is that no-one does...The complexity of a system lies in its protocol not its topology, and a protocol is what you create when you cut your monolith into pieces...

Interesting thought experiment. Is it possible to host Facebook on AWS? The conlusions is yes, but it would be costly. Good discussion on HN and on reddit.

Gradualism for the win. Spotify on Improving Critical Infrastructure Rollout. Rolling out a new version of Docker always resulted in problems. They moved to an unattended gradual rollout model using Tsunami, a tool they created, that is essentially linear interpolation as a service. The win: stretching deployments out over long periods improves reliability as problems can be detected while only a small portion of the fleet is affected.

Good look at Yahoo Mail’s New Tech Stack, Built for Performance and Reliability. High points: blazing-fast initial loading; proximity-based routing; server-side rendering; isomorphic, meaning that the same code runs on the server; efficient bundling strategies; significantly reduced the memory consumed; reduced JavaScript and CSS footprint by approximately 50%.

Nothing all that surprising, but a good roundup. We’ve studied the future of the internet since 2004. Here’s what we’ve learned: First, technological innovation will create a “datacosm” that infuses data into almost every nook and cranny of life. Second, algorithms will become more important in understanding and implementing insights from that plethora of data. Third, humans will continue to develop a new relationship with machines and complementary intelligence. And fourth, all this change will produce innovation in social norms, collective action, status credentialing and laws.

An adventure in data recovery. Can you recover precious source code off of 30 year old tapes? Yes, but you have to bake it in the oven first. Magnetic Scrolls Original Games Source Code Recovered!

Make your developers happy. Don't Settle for Eventual Consistency: in practice AP systems are not necessarily more highly available than CP systems, so don’t settle for eventual consistency in order to gain availability. The availability you think you will be getting (effective) is not the availability you will actually get (algorithmic), which will not be as useful as you might think. Also, The Limits of the CAP Theorem.

What Really Happened with Vista. Fascinating, in-depth, introspective, and completely unsurprising to any programmer.

How your brain recognizes what your eye sees: The team revealed that V2 neurons process visual information according to three principles: first, they combine edges that have similar orientations, increasing robustness of perception to small changes in the position of curves that form object boundaries. Second, if a neuron is activated by an edge of a particular orientation and position, then the orientation 90 degrees from that will be suppressive at the same location, a combination termed “cross-orientation suppression.” These cross-oriented edge combinations are assembled in various ways to allow us to detect various visual shapes.

Data structures don't save people, people save people. The blockchain paradox: Why distributed ledger technologies may do little to transform the economy: once you address the problem of governance, you no longer need blockchain; you can just as well use conventional technology that assumes a trusted central party to enforce the rules, because you’re already trusting somebody (or some organization/process) to make the rules. I call this blockchain’s ‘governance paradox’: once you master it, you no longer need it...Perhaps blockchain technologies can still deliver better technical performance, like better availability and data integrity. But it’s not clear to me what real changes to economic organization and power relations they could bring about.

Lots of different ways to version APIs. How to Version a Web API. Good discussion on HN. Moru with an interesting point: if you deprecate an API unnecessarily you force your clients to rewrite, a rewrite is always an opportunity to switch services.

Some things you should know before using Amazon’s Elasticsearch Service on AWS. Worth reading. Doesn't sound like a good experience.

Obviously Walmart doesn't understand Silicon Valley is all about removing friction. Walmart Reportedly Threatens To Cut Ties With Carriers That Also Haul For Amazon.

A super quick comparison between Kafka and Message Queues: With Kafka on the other hand, you publish messages/events to topics, and they get persisted. They don’t get removed when consumers receive them. This allows you to replay messages, but more importantly, it allows a multitude of consumers to process logic based on the same messages/events.

Top speed for top-k queries: using a priority queue with pruning based on the peek value, is a net winner.

League of Legends has a surprisingly interesting problem: how do you make a complex game deterministic, that is how do the same inputs always lead to the same result? Determinism in League of Legends: Introduction. The desire is to take a chronobreak and re-play a recorded game and restore the server to the exact state it was in at an earlier time. They needed to find uncontrolled inputs that could affect games in randomizing ways. Here's an explanation of what is the problem. The trick is during a replay there are no clients. "Being able to play back the server, where the game's "brains" are, means that for us engineers, we can peek and poke at the game while it's running. It also lets us restore a server back to an exact state, ala Project Chronobreak." Time was one problem, they had 6 completely different clock and timing APIs. Another source was client network traffic driven by player actions. Random number generators were another problem. They came up with their own XOR-SHIFT based generator, made all the code use the same version, and use a globally unique game id as the seed. You also need to initialize all those unitialized variables.

Awesome! A Brief History of the UUID: Ever since two or more machines found themselves exchanging information on a network, they’ve needed a way to uniquely identify things.

Dropbox has been building out their own infrastructure for several years. Evolution of Dropbox’s Edge Network: we’ve been storing and serving more than 90 percent of our users’ data on our own custom-built infrastructure...more than 500 million around the globe...we’ve built a network across 14 cities in seven countries on three continents. In doing so, we’ve added hundreds of gigabits of Internet connectivity with transit providers...hundreds of new peering partners ...designed a custom-built edge-proxy architecture into our network...Some users have seen and have increased sync speeds by as much as 300 percent...The edge proxy stack handles user facing SSL termination and maintains connectivity to our backend servers throughout the Dropbox network....In the new design, we introduced the concept of “metro”, which meant breaking regions into individual metros...By executing the SSL handshake via our PoPs instead of sending them to our data centers, we’ve been able to significantly improve connection times and accelerate transfer speeds... Today, the majority of our Internet traffic goes from a user’s best/closest PoP directly over peering.

Road warriors, you need this. Demystifying Charge Times. Due to numbers rounding up your phone may not be 100% charged when it says it is. Charge 30 more minutes to make sure. You cannot “overcharge” your battery. Your device contains all the proper circuitry and intelligence for that.

Uber goes the Facebook route and creates a targeted app that works for markets with slow networks. Building m.uber: Engineering a High-Performance Web App for the Global Market using Preact: m.uber is written in ES2015+, using Babel for ES5 transpilation...while our traditional architecture utilizes React (with Redux) and Browserify for module bundling, we swapped in Preact for its size benefits and Webpack for its dynamic bundle splitting and tree-shaking capabilities...To ensure we are only serving the JavaScript we need, we use Webpack for code splitting...Our core app (the essential part of the app that allows you to request a ride) comes in at just 50kB gzipped and minified, which means a three second time to interaction on typical 2G (50kB/s, 500ms latency) networks...To identify sources of dependency bloat, we made heavy use of tools like source-map-explore...To fight dependency bloat, we were selective about npm packages used in the client, making use of libraries like Just whose modules are only responsible for one function and have no dependencies...Service workers intercept URL requests, enabling network and local disk fetches to be replaced by custom fetch logic, which typically leverages the browser’s Cache API...Service workers can also significantly decrease load times...Where we need to cache response data that is too volatile for service workers, we save it to the browser’s local storage...To save on space, we use the SVG format for icon-like images.

StorageMojo: With the work being done on PCIe fabrics, I/O stack routing, composable infrastructure, and resiliance in distributed storage, we are reaching a critical mass of basic research that points to a paradigm-busting architecture for RSD. In 10 years today’s state-of-the-art hyperconverged systems will look like a Model T Ford sitting next to a LaFerrari Aperta...A key implication of RSD is that it will favor warehouse scale systems. That’s good news for cloud vendors.

AbsoluteZeroK: The best software Architect I've ever seen hasn't written a single line of code since the 90's. He fills his role perfectly as a bird's eye view of requirements and understands the architecture that will best solve a problem without actually having any clue how to write the solution at a low level. He doesn't need to, and he'd just be wasting his time if he did. The details are carried out by people under him while he worries about the bigger picture. He will say things like "Service A really should be two different services. One that does this and one that does some other thing. If we do this we should be able to save x$ per month and boost our response time. It will also allow us to split this team up into two smaller teams as well as improve separation of concerns and make our project more testable. Its priority level is 7/10, these are the pieces we will need to make this work. David, you pick what tech the pieces will be made with and come back to me with it so I can make sure we have the skills to get that done."

When It Comes To Cache Hit Ratio And CDNs, The Devil Is In The Details: Many CDNs focus on overall cache hit rate because they do not encourage their users to cache HTML. A 90% cache hit rate may sound high, but when you consider that the 10% of elements not cached are the most compute-heavy, a different picture emerges. By exposing the cache hit ratio by asset type, developers are able to see the full picture of their caching and optimize accordingly.

eeks: I also worked in that space several years ago. RDMA-based paging is a very tempting (and sexy) concept. There are however several challenges that make it very impractical in commercial scenarios: 1. Availability: what happens when a node goes down ? 2. Page sharing: how should / can multiple writers be supported ? 3. Kernel overhead: how do you hook in the paging subsystem without significant overhead ? 4. Cost/performance: rare are the companies that would blindly jump into an IB network investment; and RoCE is not as fast as IB (although it may be catching up with 25/40/100 Gb cards). In the end we ended up designing a RDMA-based DHT. Besides, this kind of approach becomes less relevant nowadays with solutions like Intel Optane.

What it's like to live in a simulation. Oats Studios - Volume 1 - God: Serengeti. Maybe we need a code of ethics for this sort of thing?

Reviewing Fastly’s New Approach To Load Balancing In The Cloud: It’s basically a SaaS service built on top of their 10+ Tbps platform, which already provides CDN, DDoS protection, and web application firewall (WAF). Fastly’s load balancer makes all of its load balancing decisions at the HTTP/HTTPS layer, so it can make application-specific decisions on every request, overcoming the two major flaws of the DNS-based solutions.

Log-structured storage with a good review of Designing data-intensive applications by Martin Kleppmann. Sounds like a great book.

Is exactly once delivery possible? Jay Kreps advances the debate. Exactly-once Support in Apache Kafka: Rather than giving up and punting all the hard problems onto the poor person implementing the application we should strive to understand how we redefine the problem space to build correct, fast, and most of all usable system primitives they can build on.

Videos from HashiDays are now available. Referential integrity is lacking, so it's hard to tell from where.

Nice report on Facebook's Dev Tools @Scale 2017 held in London. Talks include: One World: Resource Management at Scale; Cross-platform Dev Tools for Million-core Applications; Bazel: Google's Extensible, Multi-lingual, Scalable Build System; Scaling the Git Client with the Git Virtual File System.

Nice report on Facebook's Data @Scale 2017. Talks include: Next Generation of Globally-Distributed Databases in Azure; Yandex Clickhouse: A DBMS for Interactive Analytics at Scale; Evolution of Storage and Serving at Pinterest; Cadence: Micro-service Architecture Beyond Request/Reply; Spanner's SQL Evolution; Architectures for the New Era of Cloud Specialization; Accelerating Machine Learning for Computer Vision; Bulk Data Movement Serving Facebook's Global Data Storage and Processing.

Good to see CockroachDB 1.0 is Production-Ready. OpenCredo published CockroachDB: First Impressions. The good: our impressions with CockroachDB are very good overall. It’s very easy to get started with and you get a fully distributed ANSI SQL database. Even better, it was clearly designed to work well in a container and scheduler setup. The bad: Some attention needed for efficient queries, especially when doing joins; relatively immature tooling; if you have a need for more complex, ad-hoc queries, you should evaluate CockroachDB’s performance specifically for your use case. Also, Local and distributed query processing in CockroachDB.

Predicting AWS Price Reductions: if you believe that a price reduction will come within 10-12 months from now, the best option is the 1-year no upfront RI: you don’t have to front up all of the capital, and there are “only” 2 months wasted. You can also see that it makes absolutely no sense to purchase 3-years RI, with our without upfront payments: in a 3-years time span you can expect at least 2 price reductions, making your investment a much worse idea than it looked at the beginning. There are no good reasons to buy 3-years reservations.

MySQL can handle billions of rows using it's built-in table partitioning. Node.js + MySQL Example: Handling 100's of GigaBytes of Data. The data is kept on on separate parts of the disk, but there are restrictions: Query cache is not supported; Foreign keys are not supported for partitioned InnoDB tables; Partitioned tables do not support FULLTEXT indexes or searches.

Should you run your Deep Neural Network modeling/inferencing on the server or on the device. How about choosing depending on the context? Paper Summary: Neurosurgeon, collaborative intelligence between the cloud and mobile edge.

Lessons I've Learned from Three Million Downloads: If your winning idea doesn’t succeed: get up and try again, and again… and again, because for all you know your next idea could be the one that makes it; Design everything around the first time user; Listen to your critics, but don’t do what they say; A great product is better than a viral gimmick; Be generous; Take a step back, often.

Enigma Public: the world’s broadest collection of public data. Examples: Tate Collection; US Sanctions; White House Visitor Logs; FAA Near Collisions; Bureau of Labor Statistics.

dgraph-io/badger (article): An embeddable, persistent, simple and fast key-value (KV) store, written natively in Go. The biggest win of using Badger is a performant Go native key-value store. The nice side-effects are ~4 times faster Get and a potential 86% reduction in AWS bills, due to less reliance on RAM and more reliance on ever faster and cheaper SSDs.

codahale/usl4j (article): A reasonably complete implementation of the Universal Scalability Law model.

Azure/draft: A tool for developers to create cloud-native applications on Kubernetes.

ray-project/ray: a Python-based distributed execution engine. The same code can be run on a single machine to achieve efficient multiprocessing, and it can be used on a cluster for large computations.

Scalable and Sustainable Deep Learning via Randomized Hashing. Article: The savings increase with scale because we are exploiting the inherent sparsity in big data," he said. "For instance, let's say a deep net has a billion neurons. For any given input -- like a picture of a dog -- only a few of those will become excited. In data parlance, we refer to that as sparsity, and because of sparsity our method will save more as the network grows in size. So while we've shown a 95 percent savings with 1,000 neurons, the mathematics suggests we can save more than 99 percent with a billion neurons.

Deep Learning with Coherent Nanophotonic Circuits: Significant effort has been made to develop electronic architectures tuned to implement artificial neural networks that improve upon both computational speed and energy efficiency. Here, we propose a new architecture for a fully-optical neural network that, using unique advantages of optics, promises a computational speed enhancement of at least two orders of magnitude over the state-of-the-art and three orders of magnitude in power efficiency for conventional learning tasks. We experimentally demonstrate essential parts of our architecture using a programmable nanophotonic processor.

DUDETM: Building Durable Transactions with Decoupling for Persistent Memory (article): While persistent memory provides non-volatility, it is challenging for an application to ensure correct recovery from the persistent data on a system crash, namely, crash consistency. A solution...is using crash-consistent durable transaction[s]...Most implementations of durable transactions enforce crash consistency through logging. However, the. . . dilemma between undo and redo logging is essentially a trade-off between update redirection cost and persist ordering cost...[O]ur investigation demonstrates that it is possible to make the best of both worlds while supporting both dynamic and static transactions. The key insight of our solution is decoupling a durable transaction into three fully asynchronous steps.

Handbook of Russian Information Warfare: In the Russian construct, information warfare is not an activity limited to wartime. It is not even limited to the “initial phase of conflict” before hostilities begin, which includes information preparation of the battle space. Instead, it is an ongoing activity regardless of the state of relations with the opponent; “in contrast to other forms and methods of opposition, information confrontation is waged constantly in peacetime.”

Usage Patterns and the Economics of the Public Cloud: Detailed utilization analysis reveals the large swings in utilization at the hourly, daily or weekly level are very rare at the customer level and non-existent at the datacenter level. Furthermore, few customers show volatility patterns that are excessively correlated with the market. These results explain why fixed prices currently prevail despite the seeming need for timevarying dynamics. Examining the actual CPU utilization provides a lens into the future. Here utilization varies by order half the datacenter capacity, but most firms are not dynamically scaling their assigned resources at-present to take advantage of these changes.

Stuff The Internet Says On Scalability For July 7th, 2017

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale