« Sponsored Post: Apple, Domino Data Lab, Etleap, Aerospike, Clubhouse, Stream, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp | Main | 7 Interesting Parallels Between the Invention of Tiny Satellites and Cloud Computing »

Stuff The Internet Says On Scalability For July 28th, 2017s

Hey, it's HighScalability time:


Jackson Pollock painting? Cortical column? Nope, it's a 2 trillion particle cosmological simulation using 4000+ GPUs. (paper, Joachim Stadel, UZH)

If you like this sort of Stuff then please support me on Patreon.


  • 1.8x: faster code on iPad MacBook Pro; 1 billion: WhatsApp daily active users; 100 milliamps: heart stopping current; $25m: surprisingly low take from ransomware; 2,700x: improvement in throughput with TCP BBR; 620: Uber locations; $35.5 billion: Facebook's cash hoard; 2 billion: Facebook monthly active users; #1: Apple is the world's most profitable [legal] company; 500,000x: return on destroying an arms depot with a drone; 

  • Quotable Quotes:
    • Alasdair Allan: Jeff Bezos’ statement that “there’s not that much interesting about CubeSats” may well turn out to be the twenty first century’s “nobody needs more than 640kb.”
    • @hardmaru: Decoding the Enigma with RNNs. They trained a LSTM with 3000 hidden units to decode ciphertext with 96%+ accuracy. 
    • @tj_waldorf: Morningstar achieved 97% cost reduction by moving to AWS. #AWSSummit Chicago
    • Ed Sperling: Moore’s Law is alive and well, but it is no longer the only approach. And depending on the market or slice of a market, it may no longer be the best approach.
    • @asymco: With the end of Shuffle and Nano iPods Apple now sells only Unix-enabled products. Amazing how far that Bell Labs invention has come.
    • @peteskomoroch: 2017: RAM is the new Hadoop
    • Carlo Pescio: What if focusing on the problem domain, while still understanding the machine that will execute your code, could improve maintainability and collaterally speed up execution by a factor of over 100x compared to popular hipster code?
    • @stevesi: Something ppl forget: moving products to cloud, margins go down due to costs to operate scale services—costs move from Customer to vendor.
    • @brianalvey: The most popular software for writing fiction isn't Word. It's Excel.
    • @pczarkowski: How to make a monolithic app cloud native: 1) run it in a docker 2) change the url from .com to .io
    • drinkzima: There is a huge general misunderstanding in the profitability of directing hotel bookings vs flight bookings or other types of travel consumables. Rate parity and high commission rates mean that directing hotel rooms is hugely profitable and Expedia (hotels.com, trivago, expedia) and Priceline (booking.com) operate as a duopoly in most markets. They are both marketing machines that turn brand + paid traffic into highly profitable room nights.
    • Animats: This is a classic problem with AI researchers. Somebody gets a good result, and then they start thinking strong human-level AI is right around the corner. AI went through this with search, planning, the General Problem Solver, perceptrons, the first generation of neural networks, and expert systems. Then came the "AI winter", late 1980s to early 2000s, when almost all the AI startups went bust. We're seeing some of it again in the machine learning / deep neural net era.
    • Charity Majors: So no, ops isn't going anywhere. It just doesn't look like it used to. Soon it might even look like a software engineer.
    • @mthenw: As long as I need to pay for idle it’s not “serverless”. Pricing is different because in Lambda you pay for invocation not for the runtime.
    • Kelly Shortridge: The goal is to make the attacker uncertain of your defensive environment and profile. So you really want to mess with their ability to profile where their target is
    • @CompSciFact: 'About 1,000 instructions is a reasonable upper limit for the complexity of problems now envisioned.' -- John von Neumann, 1946
    • hn_throwaway_99: Few barriers to entry, really?? Sorry, but this sounds a bit like an inexperienced developer saying "Hey, I could build most of Facebook's functionality in 2 weeks." Booking.com is THE largest spender of advertising on Google. They have giant teams that A/B test the living shite out of every pixel on their screens, and huge teams of data scientists squeezing out every last bit of optimization on their site. It's a huge barrier to entry. 
    • callahad: It's real [performance improvements]. We've [Firefox] landed enormous performance improvements this year, including migrating most Firefox users to a full multi-process architecture, as well as integrating parts of the Servo parallel browser engine project into Firefox. There are still many improvements yet-to-land, but in most cases we're on track for Firefox 57 in November.
    • Samer Buna: One important threat that GraphQL makes easier is resource exhaustion attacks (AKA Denial of Service attacks). A GraphQL server can be attacked with overly complex queries that will consume all the resources of the server.
    • wheaties: This is stupid. Really. Here we are in a world where the companies that own the assets (you know, the things that cost a lot of money) are worth less than the things that don't own anything. This doesn't seem "right" or "fair" in the sense that Priceline should be a middleman, unable to exercise any or all pricing power because it does not control the assets producing the revenue. I wonder how long this can last?
    • platz: Apparently deep-learning and algae are the same thing.
    • @CompSciFact: "If you don't run experiments before you start designing a new system, your entire system will be an experiment." -- Mike Williams
    • Scott Aaronson: our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect. 
    • @CompSciFact: 'The idea that people knew a thing or two in the '70s is strange to a lot of young programmers.' -- Donald Knuth
    • Pascal Dombis: Reality is a Scanning Pattern
    • scottLobster: We should also be wondering about the lack of incentive to be a producer/owner when being a middleman is so comparatively lucrative. If capital and talent starts shifting to middlemen in pursuit of higher returns it could cause a lot of markets to stagnate with only the big players having the resources to make a profit off of producing.
    • mtgx: Wasn't the Internet supposed to eliminate the middleman? edwinnathaniel: it did when it comes to distribution channel. A hotel can have a website where people can directly book a room. But people want to compare prices, shops around, hence a new form of ... middleman
    • walrus1066: I maintain it's better to not overstretch yourself and let the project fail. Better for your health, and better for the company, so they manage the next project better and have workers who aren't burnt out.
    • Torai: I'm tired of Bitcoin's useless drama.
    • Daniel C. Dennett: Even the simplest bacterial cells have a sort of nervous system composed of chemical networks of exquisite efficiency and elegance.
    • kalleboo: one of the issues is that it's very difficult to maintain that extra bandwidth through the Great Firewall of China, resulting in any miners on the other side of where the block was found being at a disadvantage as they will get the blocks too late to be competitive. This will result in mining finally being 100% concentrated in China.
    • antognini: the normal distribution is the distribution that maximizes entropy for a given mean and variance. But in astronomy, nothing is normally distributed. (At least, nothing comes to mind.) Instead, everything is a power law. The reason for this is that most astrophysical processes are scale free over many orders of magnitude, and if you want a scale free process, it must be distributed as a power law.
    • Jean-Louis Gassée: We know who/what killed Windows Phone, and it’s not Android. We could point fingers at one or more Microsoft execs as the culprits, but that misses the point: Microsoft culture did it. Culture is dangerous; under our field of consciousness, it sneakily filters and shapes perceptions, it’s a system of permissions to emote, think, speak, and do.
    • bacongobbler: Yes, the billing [for Azure Container Instances] is specifically targeted around per-second execution. The containers can start within a few seconds, they allow for customization of CPU cores and memory, and they allow you to focus entirely on your container without having to worry about any VM management. Traditional VM-based infrastructure is still the way to go for long-running applications. This just opens a new avenue into using containers in the cloud.
    • gnarmis: the thing that really strongly drew me to hyper.sh was that I could abstract away the whole cluster behind `hyper`, like `docker` on my own computer. That was an amazing selling point, coming at after the popcorn machine of container management solutions with their very own intricate towers of complexity. It's what I like about sandstorm.io in part -- abstracting away a lot of complexity about hosting apps.
    • Ed Sperling: Fabless chipmakers, in particular, are cautious about adopting expensive new tooling and methodologies because there are fewer high-volume market opportunities at leading-edge nodes. System vendors such as Apple and Samsung have begun building their own chips for mobile phones, and Google, Facebook, Amazon and Microsoft have begun designing their own chips for the cloud. The net effect is there are fewer high-volume markets available to recoup development costs for anyone else.
    • BoiledCabbage: We fundamentally believe that computation rests upon physics. What if that's backwards, what if physics rests upon computation - and computation is the most fundamental element of the universe. While it may sound absurd at first, it's no more or less absurd than "natural laws make everything go". Somewhere we have to assert there is a bottom and it is allowed to exist - and it's rules just work. We currently just set that to physics. But if it were computation instead there could be a law of computation that you can't compute and an "infinite" universe and infinite instant communication due to the multiplicative factor of communicating. And this results in a "speed of information"/light. 
    • Robert M. Sapolsky: it is impossible to conclude that a behavior is caused by a gene, a hormone, a childhood trauma, because the second you invoke one type of explanation, you are de facto invoking them all. No buckets. A “neurobiological” or “genetic” or “developmental” explanation for a behavior is just shorthand, an expository convenience for temporarily approaching the whole multifactorial arc from a particular perspective.
    • Herbert Hoover: No doubt as years go by people forget which engineer did it, even if they ever knew. Or some politician puts his name on it. Or they credit it to some promoter who used other peoples money with which to finance it. But the engineer himself looks back at the unending stream of goodness that flows from his successes with satisfactions that few professions may know. And the verdict of his fellow professionals is all the accolade he wants.

  • Cool interview with Margaret Hamilton--NASA's First Software Engineer--on Makers. Programmers, you'll love this. One of the stories she tells is how her daughter was playing around and selected the prelaunch program during flight. That crashed the simulator. So like a good programmer she wanted to prevent this from happening. She tried to get a protection put in because an astronaut could actually do this during flight. Management would certainly allow this, right? She was denied. They said astronauts are trained never to make a mistake so it could never happen. Eventually she won the argument and was able to add code to protect against human error. So little has changed :-)

  • StackOverflow analyses their data to find Trends in Cloud Computing: Who Uses AWS, Who Uses Azure?: it’s apparent that while the two platforms started at a similar level of traffic in 2012, AWS has grown faster...developers that use C# overwhelmingly choose Azure, while other developers use the platform to a much lesser extent...Node.js developers are by far the most likely to visit AWS questions...Developers who work with C and C++ were particularly unlikely to use either platform...Azure is the platform of choice in several industries, particularly consulting and energy...AWS is particularly popular in the technology industry...Most countries visited more AWS questions than Azure, though to different extents. One notable exception is the Netherlands, which visited about twice as many Azure questions as AWS questions.

  • Do you want a faster Internet? Then Google says "come to our cloud, the Internet is fine!" TCP BBR congestion control comes to GCP: Google Cloud Platform (GCP) now features a cutting-edge new congestion control algorithm, TCP BBR, which achieves higher bandwidths and lower latencies for internet traffic...improved YouTube network throughput by 4 percent on average globally...BBR's throughput can reach as much as 2,700x higher than today's best loss-based congestion control; queueing delays can be 25x lower...BBR also keeps network queues shorter, reducing round-trip time by 33 percent...BBR keeps queuing delay 25x lower than CUBIC ...BBR ("Bottleneck Bandwidth and Round-trip propagation time") is a new congestion control algorithm developed at Google...BBR considers how fast the network is delivering data. For a given network connection, it uses recent measurements of the network's delivery rate and round-trip time to build an explicit model that includes both the maximum recent bandwidth available to that connection, and its minimum recent round-trip delay. BBR then uses this model to control both how fast it sends data and the maximum amount of data it's willing to allow in the network at any time

  • Docker operations slowing down on AWS. Insightful discussion on HackerNews not so much about the article, but about the AWS vs VPS vs some other cloud vs bare metal vs whatever. Yes, it's a topic that's been done to death, but when you're making these decisions they are invaluable. chx: And then people consider me a dinosaur when I say, no cloud, just rent a server or two (not colo! just dedicated servers). pizza234: I managed a metal to AWS transition, and 5x definitely doesn't match the costs I experienced (in this specific case, it was around the lines of 1.2x). _Codemonkeyism: I think people need to make a conscious tradeoff between ~5x AWS cost compared to rented server vs. OPS costs. vidarh: I love it when customers pick AWS (though I usually advice not to, unless they have very specific needs), as my billable hours are way higher for those clients, though it is annoying having to deal with the inevitable "why is my AWS bill so big?" after I'd told them exactly why it'd be expensive in the first place. This is particularly true with bandwidth heavy setups, where AWS charges tens of times more per TB transferred than e.g. Hetzner. Roritharr: The problem with renting boxes is the hidden costs if you want to do it right. laumars: I work with both bare metal servers matching your description and both self-hosted and private clouds. Frankly I think your rant misses one of the most important point of working with AWS and that's the convenience and redundancy that the tooling offers. AWS isn't just about single instances, it's about having redundant availability zones with redundant networking hardware and about being able to have disaster recovery zones in whole other data centres and having all of the above work automatically. r1ch: I run a 45M pv/mo network on a single dedicated box. The cost of our bandwidth alone at any cloud provider costs multiples of what we are paying for our current server. The cloud just doesn't make economical sense for a lot of workloads.

  • The power of portable code. React as a Platform: A path towards a truly cross-platform UI. Leland Richardson (Airbnb) tells the compelling history of Reactive Native's development. Could one UI to rule them all? With platform extensions it might just work. With airbnb/react-sketchapp they were able to leverage components that were initially just targeting Reactive Native and are now rendering a component picker and previewer in a webview and in Sketch. Using a single codebase rendering to two separate platforms to create designs for a third platform that they know will be accurate because they use the exact same code. They were also able to extend it to VR.

  • As California’s labor shortage grows, farmers race to replace workers with robots. I have strawberry plants. I regularly drive past miles of strawberry plants. That they can harvest 1 out of 3 strawberries is impressive. Sad to see asparagus moving to Mexico: grown on perennial beds that last a decade or so, asparagus must be selectively harvested every day during its 90-day season. Machines have utterly failed to duplicate human judgment and dexterity.

  • Node.js is used 85% for web apps. No problem. Believe it. But node.js is used 8% for embedded systems? I feel safer already. This is what Node.js is used for in 2017 — Survey Results

  • Claude Shannon sounds like an interesting guy. After 10,000 Hours With Claude Shannon: How A Genius Thinks, Works, and Lives, here's 12 things they learned: Cull your inputs; Big picture first. Details later; Don’t just find a mentor. Allow yourself to be mentored; You don’t have to ship everything you make; Chaos is okay; Time is the soil in which great ideas grow; Consider the content of your friendships; Put money in its place; Fancy is easy. Simple is hard; The less marketing you need, the better your idea or product probably is;  Value freedom over status; Don’t look for inspiration. Look for irritation.

  • Reason number 573 you are just a data stream. Your Roomba already maps your home. Now the CEO plans to sell that map.

  • Who will control the swarm? Will the control plane be centralized or decentralized? Surprisingly, these guys think centralized: "A collection of faculty at Stanford have a different view. They believe that device swarms will be managed centrally, using applications running in large datacenters, much the way the cloud centralized big data." You might think latency would be a killer, but that was not mentioned. Me thinks the cruel world will bring it up.

  • Clever, the same sort of thing people try to do to fool facial recognition programs. Instead of hacking self-driving cars, researchers are trying to hack the world they see: AI researchers are now debating whether their software could be susceptible to “hacks” of real-world objects like stop signs, invisible to the human eye but seen by machines. In this scenario, a slight pattern or sticker applied to the sign would trick self-driving cars—or any AI—into misidentifying the sign as something else entirely, meaning the car would not necessarily slow or stop.

  • People, such clever little things. How a fish tank helped hack a casino:  hackers attempted to acquire data from a North American casino by using an Internet-connected fish tank...The fish tank had sensors connected to a PC that regulated the temperature, food and cleanliness of the tank...Somebody got into the fish tank and used it to move around into other areas (of the network) and sent out data

  • gRPC in Production. Lots of good code examples. Good explanation of why REST APIs suck (Streaming is difficult, Operations are difficult to model, Inefficient, Your internal services aren’t RESTful anyways, hard to get many resources in a single request, No formal (machine-readable) API contract). It has some downsides: Load Balancing, Structured error handling is unfortunate, No support for browser JS, Breaking API changes, Poor documentation for some languages. No standardization across languages.

  • Ideas on how to talk to ET has changed with technology. We started with a clever, minimalist approach, now we just send lots of data and let them figure it out. Greetings, E.T. (Please Don’t Murder Us.). Drake in 1974: He chose to send exactly 1,679 pulses, because 1,679 is a semiprime number: a number that can be formed only by multiplying two prime numbers together, in this case 73 and 23. Drake used that mathematical quirk to turn his pulses of electromagnetic energy into a visual system...imagine I send you a message consisting of 10 X’s and 5 O’s: XOXOXXXXOXXOXOX. You notice that the number 15 is a semi-prime number, and so you organize the symbols in a 3-by-5 grid and leave the O’s as blank spaces. Seth Shostak now: I propose that we just feed the Google servers into the transmitter. Send the aliens the World Wide Web

  • Uber wanted an internal chat system that could handle 50,000 users in a single chat environment. The Road to uChat: Building Uber's Internal Chat Solution. Why? Uber is growing hyperly. Few off the shelf solutions could handle that load. After months of testing they settled on open-source Slack competitor Mattermost. They tested with 70,000 concurrent users with a send rate of 80 to 200 messages per second. They searched search over 20,000 channels simultaneously. They ran into bottlenecks. What's cool is they contributed fixes and their test harness to the project. Puppet was used for deployment. They rewrote the Mattermost mobile apps in React-native and contributed those changes as well.

  • The lengths people will go to get a labeled data set. Calling the shots at Wimbledon. 3 people watch every stroke at Wimbledon, categorize it in all sorts of ways, all in just a few seconds. 

  • Is sales tax coming to cloud services? New York addresses the sales tax treatment of cloud collaboration services: A taxpayer’s sale of a cloud collaboration service product is subject to New York state and local sales taxes because it constitutes prewritten software. 

  • Five times the computing power: Until now, people have believed that once an FPGA is full it cannot accommodate any more. If you want new functionality in this case, you have to completely rebuild the hardware, which is expensive...A clever change in the signal routes gives the chip a capacity that is five times greater for each hardware unit...My wife and I are planning on starting a microbrewery, so that when the thesis is finally presented I will be able to offer beer I have brewed myself.

  • The king is dead. The reign of general-⁠purpose microprocessors is over. New iPad Pro's A10X Chip Revealed as First Manufactured Using TSMC's 10nm Process: which is said to deliver 30 percent faster CPU performance than previous-generation iPad Pro models and 40 percent faster graphics performance. Intel, not so much. Long live custom chips. Bespoke Processors: A New Path to Cheap Chips

  • Oldy (the idea, not the article) but a goody. Preventing server overload: limit requests being processed: Limit the number of concurrent requests in all servers to avoid cascading failures. You need the limit to be well below the "catastrophic" failure point for your service, but setting it too low will limit the throughput of your system. If I'm forced to guess, I'd start with a limit set to some multiple of the number of CPUs. E.g. maybe 3× the number if you assume that your requests spend 33% of their time executing code, and 66% of their time waiting (e.g. for other network requests or on disk).

  • Good overview of Event Bus Implementations: There are couple of ways to implement event driven pattern to your stack...In broker topology, you send events to a central broker and all subscribers of these broker receive and process events asynchronously...event broker topology promises reactive, event driven, asynchronous systems...In this story, I shared four implementations of event bus...Notify the Event to All Subscribers...Notify the Event Shadow to All Subscribers...Notify the Event Shadow to Filtered Subscribers...Ordered Delivery to Subscribers...

  • Pull up a chair kids, Juho Snellman tells a story of an exciting debugging adventure. The mystery of the hanging S3 downloads: S3 is one of those rare services that disable timestamps. And that actually makes for a big difference in this case. With timestamps, each retransmitted copy of a packet would use a different timestamp value [2]. And when any part of the TCP header changes, odds are that the checksum changes as well...switching the cable modem from router mode to bridging mode. Bam, the problem was gone. In retrospect this makes sense: in router mode the cable modem needs to update the checksums for each packet that pass through the device. In bridging mode there's no NAT, so no checksum update is needed.

  • Azure/fast_retraining (article): we compare two of the fastest boosted decision tree libraries: XGBoost and LightGBM. We will evaluate them across datasets of several domains and different sizes.

  • github.com/arc-repos (article): provision and deploy cloud architecture as text

  • the-laughing-monkey/cicada-platform: A Distributed Direct Democracy and Decentralized Application Platform

  • airbnb/binaryalert: an open-source serverless AWS pipeline where any file uploaded to an S3 bucket is immediately scanned with a configurable set of YARA rules. An alert will fire as soon as any match is found, giving an incident response team the ability to quickly contain the threat before it spreads.

  • Hey, just letting you know I've written a novella: The Strange Trial of Ciri: The First Sentient AI. It explores the idea of how a sentient AI might arise as ripped from the headlines deep learning techniques are applied to large social networks. I try to be realistic with the technology. There's some hand waving, but I stay true to the programmers perspective on things. One of the big philosophical questions is how do you even know when an AI is sentient? What does sentience mean? So there's a trial to settle the matter. Maybe. The big question: would an AI accept the verdict of a human trial? Or would it fight for its life? When an AI becomes sentient what would it want to do with its life? Those are the tensions in the story. I consider it hard scifi, but if you like LitRPG there's a dash of that thrown in as well. Anyway, I like the story. If you do too please consider giving it a review on Amazon. Thanks for your support!

Reader Comments (2)

> gRPC in Production. Lots of good code examples. Good explanation of why REST APIs suck (Streaming is
> difficult, Operations are difficult to model, Inefficient, Your internal services aren’t RESTful anyways, hard to
> get many resources in a single request, No formal (machine-readable) API contract). It has some downsides:
> Load Balancing, Structured error handling is unfortunate, No support for browser JS, Breaking API changes,
> Poor documentation for some languages. No standardization across languages.

Yeah, or we use Apache Thrift. Standardized across 20+ languages and more performant.

July 29, 2017 | Unregistered CommenterJens Geyer

Here's a (third-party) benchmark: http://szelei.me/rpc-benchmark-part1/

July 29, 2017 | Unregistered CommenterJensG

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>