Stuff The Internet Says On Scalability For May 19th, 2017

Hey, it's HighScalability time:

Who wouldn't want to tour the Garden of Mathematical Sciences with Plato as their guide?
If you like this sort of Stuff then please support me on Patreon.

  • 2 billion: Android users; 1,000: cloud TPUs freely available to researchers; 11.5 petaflops: in Google's machine learning pod; 86 billion: neurons in the human brain, not 100 billion; 1,300: Amazon's new warehouses across Europe; $1 trillion: China self-investment; 1/7th: California's portion of US GDP; more: repetition in songs; 99.999%: Spanner availability, strong consistency, good latency; 6: successful SpaceX launch in 4 months; 160TB: RAM in HPE computer; 40,000+ workers: private offices > open offices

  • Quotable Quotes:
    • Tim Bray: with­out ex­cep­tion, I ob­served that they [Per­son­al com­put­er­s, Unix, C, the In­ter­net and We­b, Java, REST, mo­bile, pub­lic cloud] were ini­tial­ly load­ed in the back door by geek­s, with­out ask­ing per­mis­sion, be­cause they got shit done and helped peo­ple with their job­s. That’s not hap­pen­ing with blockchain. Not in the slight­est. Which is why I don’t be­lieve in it.
    • @swardley: Amazon continues to take industry after industry not because those companies lack engineering talent but executive talent.
    • @RichRogersIoT: "I bought my boss two copies of The Mythical Man Month so that he could read it twice as fast." - @rkoutnik
    • @GossiTheDog: Seeing ATMs and banks go down here suggests fundamental issues which flashing boxes can't fix. Design, architect a security model.
    • @stevesi: Is Google's TPU investment the biggest advantage ever or laying groundwork for being disrupted? Can Google out-innovate sum of industry?
    • Ryan Mac: Last year, Craigslist took in upwards of $690 million in revenue, most of which is net profit
    • @dberkholz: Capex vs opex budget for tools is a bigger deal than I'd fully appreciated. Welcome to the enterprise!
    • Vint Cerf: AI stands for artificial idiot. 
    • Douglas Hofstadter: In the end, we are self-perceiving, self-inventing, locked-in mirages that are little miracles of self-reference. 
    • cocktailpeanuts: I feel like the term "Serverless" has been hijacked to a point that it will soon become meaningless just like "AI", "IoT", etc. Basically "Serverless" in 2017 has become just a hype friendly marketing friendly way of saying "Saas".
    • @skupor: Over last 20 years, m&a exits for venture backed companies has gone from 60% to 90% of exits (was 20% in 1990)
    • bpicolo: C# with visual studio is, I think, the most productive environment I've come across in programming. It's ergonomically sound, straightforward, and the IDE protects me from all sorts of relevant errors. Steve mentioned Intellij is a bit slower than he'd hope typing sometimes. I totally agree with that. I think Visual Studio doesn't quite suffer from that.
    • @codepitbull: A good developer is like a werewolf: Afraid of silver bullets.
    • @sehnaoui: Coffee shop. People next to me are loud and rude. They just found the perfect name for their new business. I just bought the domain name.
    • David Robinson: Python and Javascript developers start and end the day a little later than C# users, and are a little less likely than C programmers to work in the evening.
    • Ben Thompson: The fatal flaw of software, beyond the various technical and strategic considerations I outlined above, is that for the first several decades of the industry software was sold for an up-front price, whether that be for a package or a license. The truth is that software — and thus security — is never finished; it makes no sense, then, that payment is a one-time event.
    • boulos: Spanner does things for you that MySQL et al. don't. Having an automagic Regional (and eventually Global if you'd like) database without dealing with sharding is worth $8k/year even to me. So even if it could fit on $10/month of hardware, I don't begrudge them for charging a service fee, rather than saying "This is how much cores, RAM, disk and flash this eats".
    • codedokode: One of the reasons why such attack was possible is poor security in Windows. Port 445 that was used in an attack is opened by a kernel driver (at least that is what netstat says on WinXP) that runs in ring 0. This driver is enabled by default even if the user doesn't need SMB server and it cannot be easily disabled.
    • @RichRogersIoT: Job interview:  Implement Large Hadron Collider on whiteboard / Actual job:  Jira bug-id #2342: Move login button 3 pixels to left
    • slackingoff2017: This is part of a worrying new trend. Increasingly you can't buy software anymore, only rent. Innovation is being kept from scrutiny hidden behind closed doors. The kind of thing patents were meant to prevent back when the system wasn't broken.
    • Scott Borg~ Engineers need to look at their products from the standpoint of the attacker, and consider how attacker would benefit from cyberattack and how to make undertaking that attack more expensive. It’s all about working to increase an attacker’s costs
    • @tottinge: "A code base isn't a thing we build, it's a place we live. We don't seek to finish it and move on, but to make it liveable"  @sarahmei
    • Sam Kroonenburg: We Believe …Don’t do the things that someone else can do. Do the things that only we can do. [re: Serverless]
    • Anush Mohandass: What you’re starting to see are different architectures for different workloads. There will be chips for image recognition, SQL, machine learning acceleration. 
    • Craig McLuckie: Given the current state-of-the-art, most users will achieve best day-to-day top line availability by just picking a single public cloud provider and running their app on one infrastructure.
    • watmough: Chromebooks work, and I am a big fan of them in education. I have a pretty good idea how hard our teachers work, and I'd hate to think of the Windows bullshit being imposed them, like it's imposed on me and my coworkers.
    • axilmar: It [React Native] is the future! But you need experience to make it work, and navigation/routing is still being worked out, and it is native, but it is Javascript, and it is crossplatform, but you need to be aware of the differences of the two platforms, and styling uses something that is like css but not entirely, you have to learn all the intricate details...
      Thank god software engineering "practices" are not used in other engineering disciplines...
    • Anton Howes: So without the British acceleration of innovation, the Industrial Revolution would likely have happened elsewhere within a few decades. France and the Low Countries and Switzerland and the United States were by the eighteenth century well on their way towards sustained modern economic growth. 
    • Dr. Suzana Herculano-Houzel~ evolution is not progress, all that evolution means is change over geological time, it's not for the better, it's not for the worst, it's just different. All it has to do with is generating diversity. We have ample evidence we are not descendents of reptiles, we are close cousins. We could not have a basic reptile brain to which something else was added. We know now that every reptile has a neo-cortex. There is not such thing as triune brain. There is no such thing as reptilian brain on top of which a new structure appeared only in mammals. We all have it. The brain is very much the same in its essence, the difference lies in the quantities. 
    • James Clear: The great mistake of Hurricane Katrina was that the levees and flood walls were not built with a proper “margin of safety.” The engineers miscalculated the strength of the soil the walls were built upon. As a result, the walls buckled and the surging waters poured over the top, eroding the soft soil and magnifying the problem. Within a few minutes, the entire system collapsed.
    • elvinyung: This "modern" Spanner feels very different from the one we saw in 2012 [1]. Some interesting takeaways: * There is a native SQL interface in Spanner, rather than relying on a separate upper-layer SQL layer, a la F1 [2] * Spanner is no longer on top of Bigtable! Instead, the storage engine seems to be a heavily modified Bigtable with a column-oriented file format * Data is resharded frequently and concurrently with other operations -- the shard layout is abstracted away from the query plan using the "distributed union" operator * Possible explanation for why Spanner doesn't support SQL DML writes: writes are required to be the last step of a transaction, and there is currently no support for reading uncommitted writes (this is in contrast to F1, which does support DML) * Spanner supports full-text search (!)

  • Cautionary tale number 1000 on depending on someone else's service. Firebase Costs Increased by 7,000%! Google changed something (billing for SSL overhead) and HomeAutomation's bill spiked. There was no warning. There were no tools to tell why. Support stopped replying. There's no one to call. The recommendation is to protect yourself from being trapped by a service from the very beginning. They've moved to Lambda/DynamoDb, which many point out is also a potential service trap. The Firebase Founder responded with an explanation, saying he was "embarrassed by the level of communication on our side." Good discussion on HackerNews and on reddit. Lots of people with similar stories, complaints about lack of support with Google, complaints about lack of transparency, and the usual about never rely on anything ever. 

  • Serverlessconf Austin '17 videos are now available (most of them anyway). 

  • Is indigestion our only hope? Why Amazon is eating the world: Consensus is that we’ve hit a tipping point and the retail industry is finally seeing some major collateral damage from Amazon’s monster growth...I believe that Amazon is the most defensible company on earth...It’s the fact that each piece of Amazon is being built with a service-oriented architecture, and Amazon is using that architecture to successively turn every single piece of the company into a separate platform...The key advantage that Amazon has over any other enterprise service provider — from UPS and FedEx to Rackspace — is that they are forced to use their own services...they’re permanently dogfooding...Amazon has committed to this idea at a granular level. Even when it comes to services that can’t be sold, Amazon is still making a push to expose the services externally. The perfect example of this is Amazon’s Marketplace Web Service (MWS) API...Amazon is uncatchable. It took Amazon 10 years to perfect FBA [Fulfillment By Amazon]. Even if Walmart could do it in 5, where will Amazon be by the time they roll it out? And I haven’t even begun to touch the surface of Amazon’s lesser-known, industry-shattering programs like Seller Fulfilled Prime and Direct Fulfillment. I’m not sure we’ll see a mass-market retailer compete successfully against Amazon within my lifetime 

  • Talk about high availability design. @WhatTheFFacts: Female Kangaroos, Koalas, Wombats and Tasmanian devils all have 3 vaginas.

  • DigitalGlobe has a problem, they've empictured the whole earth using satellites and have 100 petabytes of imagery (54 million files) needing a push into the cloud. So they used Amazon's Snowmobile to move data like one moves furniture. Fortunately, they can handle, every day, uploading 100 terabytes from five satellites into S3. S3 is replacing their 12,000 tape bays feeding 60 LTO-5 tape drives. The driver is building higher value applications on top of the data, you can't do that if the data is on tape. DigitalGlobe moves to the cloud with AWS Snowmobile

  • The Serverless Revolution for Everyone: FaaS (functions as a service) is a paradigm shift in cloud – we’ve moved from real servers (physical, data centers) to fake servers (virtualization, containers, etc.) to uploading code and allowing a provider to run it in response to events. Something is quite different there...The other revolutionary area in what serverless computing is the pricing model. Billing based on usage, vs. monthly/quarterly/etc. billing for the existence of a server (real or virtual) is a shift from how the cloud has worked.

  • Papers from CHI 2017 are now available.

  • PostgreSQL 9.6.2 vs CockroachDB 1.0 vs ScyllaDB 1.6.4: "This is shocking for me that CockroachDB 2x-19x slower than PostgreSQL, so I file a bug report and one for scylla (slow query on larger datasets)." Also has a succinct overview of each database.

  • Do you think this is true? The Oracle at Delphi, or How I Learned to Stop Worrying and…accept…NoSQL: Yesterday's hardware demanded that we crunch data in the database tier and send the smallest result set possible over the wire. Today's hosting options make it possible – and affordable – to pull a set of records out of storage and iterate over them in the web tier, using the same language the rest of the app is written in and taking advantage of the app's own caching infrastructure.

  • GitHub tells How Four Native Developers Wrote An Electron App. They wrote their desktop app using Electron so they could use web technologies HTML, CSS, JavaScript, TypeScript. Why use the web? Building native apps for multiple platforms doesn’t scale. The win: share logic and UI across all platforms; tightens up the feedback cycle, designs can be tweaked live.

  • Serverless Cost Calculator: Calculating cost for AWS Lambda, Azure Functions, Google Cloud Functions, and IBM OpenWhisk. 

  • On AWS figuring how much a new API call will cost is crazy difficult. Segment has method they describe in Spotting a million dollars in your AWS account. They bucket their infrastructure into: integrations, API, warehouses, website and CDN, internal. They gathered data from: the AWS billing CSV, tagged AWS resources, untagged resources. Then it got complicated. Biggest win: making it easy to continuously estimate your spend rather than running the occasional ‘one-time-analysis’.

  • It turns out you don't need to slap sensors on everything and have them communicate to figure out what's going on. You can have a centralized sensor that uses machine learning to figure out what's happening in a space. This is spooky in a really interesting way. Internet of Things Made Simple: One Sensor Package Does Work of Many: Machine-learning algorithms can combine these raw feeds into powerful synthetic sensors that can identify a wide range of events and objects — for instance, distinguishing between a blender, a coffee grinder and mixer based on sounds and vibrations. Even soft, more subtle sounds, such as writing or erasing on a whiteboard, can be detected. More than just recording whether a device is in use or not, synthetic sensors can track the state of a device — whether a microwave door is open or closed, if cooking is interrupted, and whether the microwave has completed its cooking cycle...It can tell you not only if a towel dispenser is working, but can keep track of how many towels have been dispensed and even order a replacement roll when necessary. A faucet left running when a room is unoccupied for a long time might prompt a warning message to the user's smartphone...Even more advanced sensing can infer human activity, such as when someone is sleeping, showering, watching streaming video or has left home for work. 

  • Stephen Wolfram’s Bestseller, A New Kind of Science, is now free online.

  • "A real craftsperson makes his own tools," that's the old saying. This could be why we have so many frameworks. As craftspeople programmers want to build their own tools. It's a mark of your skill level. It's a right of passage. The difference is in the digital world we can share tools. A hammer can't be shared. A Javascript framework is infinitely sharable at zero marginal cost. 

  • Data Loss By Design. Not through corruption or anything, but by updating data in place instead of saving all changes in a log so you can apply analytics. The Future of Event Stream Processing. Sounds like a different implementation of a data warehouse. 

  • A market you may not have thought of. James Geurts: Finding and Delivering Technologies for Special Operations Forces Worldwide. 

  • Benchmarking the throughput of Zookeeper in AWS (Amazon EC2): Zookeeper 3.4.8, right out of the box, with the default configuration on a default Ubuntu Server 16.04 image was able to achieve 12k create/update/delete req/sec which is 50% of the performance in the original Zookeeper paper1. 

  • Get your data here. The World’s Largest Street-level Imagery Dataset for Teaching Machines to See. And Coarse Discourse: A Dataset for Understanding Online Discussions. And The Collaborative Interaction Corpus (CIC@NCSU). And The Stanford Track Collection

  • 70% cheaper Kubernetes cluster on AWS: tune the requests and limits of your services; instead of having a ELB for each service, we now have a single ELB for the Ingress. All requests pass through it, with nginx doing the work of sending them to the right pods; shutdown idle pods; tune the scheduler; use the right instance size; use the right disk size; spot instances were  80% cheaper with a longer spin up time; consider reserved instances; 

  • So, developers, ready to lose that laptop and move development to the cloud? Environmental Sympathy: Serverless, precisely because it’s so heavily reliant on pre-existing vendor services and billed like a utility, makes it possible for every developer to exclusively develop their “service” in the cloud. That service can have its own persistence engine, cache, queue, monitoring system, and all the other tools and namespaces needed to develop. Feature branches are the same as production branches & both are cloud ready.

  • The State of Go is persistent.

  • We've seen delivery only restaurants, here's a Delivery-Based Grocery Store that looks like an Amazon warehouse for groceries. 

  • Dropbox Introducing Cape, their asynchronous job runner, triggered by event streams from varied sources, to do things like index a file or generating previews. It processes several billion events a day with 95th-percentile latencies of less than 100 milliseconds and less than 400 milliseconds. Events are ingested into Kafka and run on their own lambda framework.

  • The horse gives way to the bit. In the future cars will be compared based on their compute capacity, not their horsepower. The state of the car computer: Forget horsepower, we want megahertz!.

  • Nice tutorial on Building Smart Parking Meters with Realtime Availability Monitoring Using IBM Bluemix & PubNub services for Android. If parking trauma is a trigger for you then you may want to skip this article.

  • Moore’s Law: Toward SW-Defined Hardware: Rather than relying just on hardware or just on software, the industry is shifting toward software-defined hardware. This has several major implications: It moves the chip hardware much closer to the customer, allowing chipmakers to become more involved in end markets than at any time since the PC era; Hardware-software co-design a requirement rather than an option, and forces iterative design improvements on both sides; The emphasis is on more individualized designs, rather than megalithic one-size-fits-all chips.

  • Wonderfully explained. How Cloudflare analyzes 1M DNS queries per second. Using Yandex's OLAP, ClickHouse, and Kafka.

  • Very detailed post on Rearchitecting Airbnb’s Frontend. Like Twitter Airbnb has to eventually grow up and move past their Rails childhood. With what else? React. 

  • An edge case for cache busting: Now if we play through the same user flow the last step becomes User downloads only changed assets. This is far more optimized. Especially for high traffic websites. If we consider separating out jQuery (40KB minimized) for a site with 1 million hits per month, that's 40GB of savings. Although that may not sound like much in the modern age of the internet, that could be the difference between plan tiers with your CDN.

  • Google are not the only ones who can write "we did this great thing" type papers. Amazon wants in. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases: In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. We describe how doing so not only reduces network traffic, but also allows for fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage. We then describe how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols. Finally, having operated Aurora as a production service for over 18 months, we share the lessons we have learnt from our customers on what modern cloud applications expect from databases.

  • How to Build a Non-Volatile Memory Database System. Just a slide deck so far, but you can probably get something out of it.

  • Andrei Alexandrescu Design by Introspection Talk at Google Campus TLV: andralex: This talk shares early experience with Design by Introspection, a proposed programming paradigm that has enough demonstrable results to be worth sharing. The tenets of Design by Introspection are: The rule of optionality: Component primitives are almost entirely opt-in. A given component is required to implement only a modicum of primitives, and all others are optional. The component is free to implement any subset of the optional primitives. The rule of introspection: A component user employs introspection on the component to implement its own functionality using the primitives offered by the component. The rule of elastic composition: a component obtained by composing several other components offers capabilities in proportion with the capabilities offered by its individual components.