Stuff The Internet Says On Scalability For October 13th, 2017

Hey, it's HighScalability time:

Tech is transforming how food is being grown. Lots of opportunity for local nerdy production. Greenhouses even look like dartacenters! (This Tiny Country Feeds the World)

If you like this sort of Stuff then please support me on Patreon.

  • 320 trillion: ops/second in Nvidia driverless-car computer; .25%: Lambda invocations impacted by cold starts; $30,000: monthly take hijacking computers to mine cryptocurrency; 400 gbps: Ethernet standard to be ratified this year; 2.1 million: MySQL 8.0 query/second; 100,000: Kiva robots owned by Amazon; 50,000: greenhouses in Egypt's new farm city; 100 petabytes: new hard drives ordered by Backblaze; 20 million: max Bitcoin users per month; 662 million: unused vacation days in US; 92 billion: Pornhub views per year; 1,000: new Facebook hires to review ads; 12 milion: Tinder matches per day; $1 billion: Google training grants; 

  • Quotable Quotes: 
    • @toddmotto: Space X sends a rocket up into space. Lands back on its feet back on earth 7minutes later. I can't even run an npm install in that time.
    • nappy-doo: Years ago, I started at Google, and was in Charlie's cafe, eating alone. I'm sitting there, and up walks Ken Thompson. He sits down, introduces himself as Ken, and asks me what I work on. We sat there for a good 40 minutes just chatting. One of my coolest memories of working at Google was that time. He was so down to earth, never bothered to talk up about who he was (even though I knew). I really appreciated that.
    • @asymco: The popularity of iPhone with US teens at all-time high. Android is at 13% and flat. Implies Apple taking share from non-consumption.
    • @brianleroux: “We have leapfrogged containers which are a disaster for security”—@marknca on serverless #ServerlessConf
    • batmansmk: In every way, [Tensorflow] reminds me of Angular.io project. A failed promise to be true multi-language, failing to use the expressiveness of python, with a super large API that tries to do things we didn't ask it to do and a lack of a general sounding architecture.
    • @bodil: Fun fact: the Erlang runtime is implemented in C, which is an untrue computer language.
    • @Werner: There is no compression algorithm for experience.
    • @jaksprats: Ramcloud: JS faster than C running untrusted code due to sand boxing overheads … ... Cloudflare workers nailed it :)
    • sig: It is amazing to think that only a few years ago you could take an old laptop, download a miner, stick it in a closet and it would spit out something now worth a quarter of a million dollars every few days. There are a lot of problems with Bitcoin. If you look at my comment history you will see I am pretty down on it for all sort of reasons I won't go into at the moment. However sometimes you just have to take a step back and admire how crazy impressive it is that Bitcoin has reached this point.
    • Linda Nichols~ How can you go serverless without vendor lock-in? Linda proposes two possibilities: containers multi-provider frameworks
    • @DivineOps: I don't think anyone can *afford* inventory. It's just that if everyone is moving slow, you can get away with moving slow
    • mrb: The bug behind BIP 50 caused a fork, however the bugfix wasn't a hard-fork. By definition a hard-fork is a fork that require all Bitcoin nodes to be updated. In the case of that bugfix only some nodes had to be updated (the ones run by miners making up a majority of the hash power) then the rest of the non-updated nodes automatically reorg'd to the right chain, the one with the most work.
    • @datawireio: 100+ Million members. 100s of #microservices. Hundreds of thousands of instances. <10 Core SREs.
    • @rbranson: ... but single-rack systems are an increasingly rare situation, won’t practically exist in 10 years.
    • @ryan_sb: Not that it's always right to follow Google/FB/Twitter, but note that *all* of them have kept monorepos through massive growth
    • Bob Frankston: I hate the word coding; it’s like calling writing, typing.
    • cletus: People also overestimate their needs. They rush to create Hadoop clusters and distributed NoSQL solutions because, you know, relational DBs can't keep up with their "Big Data" (which means, millions of rows) when in fact you can dump billions of rows into a single MySQL instance.
    • Bob Frankston: Algorithms are the new bureaucracy
    • Thomas Ryan: Coffeelake is a good chip and a clear improvement over both Skylake-X and Kabylake. It’s not a massive leap, but it’s a generation of products that appears to be solidly better than the last. It has extend Intel’s lead in the areas where they were beating AMD and largely closed the gaps in the areas that they weren’t.
    • @alexlovelltroy: Oooh. Describing serverless microservices as state machines simplifies defining microservice boundaries. #ServerlessConf
    • kevin42: I used a GCE to test some image processing software I wrote a while ago (it runs on a very large dataset). I configured a 64 core machine with 128gb of memory. It ran perfectly, although it cost about $200 to run the test for a day. Sure, it wasn't the highest performance per CPU, but I didn't have to buy the bare metal, I can scale up the number of cores if need be, and I can fire one up whenever I want one.
    • godzillabrennus: I’ve been a [Backblaze] customer for years and recently had a catastrophic failure of a computer and it’s direct attached backup drive. I have spent the last four days waiting for backblaze to create a restore a backup for a computer on and last I checked it was at 9%.
    • hashtagframework: I wonder if the author realizes that the Airbus A380 is itself an IT project with 120 million lines of code, and 330 miles worth of 100,000 individual wires that perform 1,150 different tasks. IT gets no credit... the wings are doing all the work.
    • @troyhunt: 23 hours and 42 minutes from initial private disclosure to @disqus to public notification and impacted accounts proactively protected
    • @bglick: 2 chained functions with 90% performance guarantees have an 81% performance guarantee. Chain 7 and you're < 50%. #Serverlessconf
    • @faunadb: We're moving from a product stack to more utility based architectures/practices (aka #serverless) whether you like it or not, so get on board. "It's not a question of if, but when." @swardley #serverlessconf 
    • @FrankPasquale: “Software problems accounted for nearly 15% of US car recalls in 2015, up from less than 5% in 2011"
    • @jessitron: Octopuses do distributed decisionmaking. a tentacle can see and decide what color to be, locally.
    • @faunadb: "When you're on the cutting edge of the cutting edge of a new technology, you have to realize there's a very long tail of adoption in a large organization" @marknca #serverlessconf
    • @Joab_Jackson: CQRS (Command/Query Responsibility Seperation): Fancy name for separating reads & writes into seperate channels @ben11kehoe #ServerlessConf
    • @EconCharlesRead: 40% of Europe’s domestic freight goes by sea, but just 2% does in America due to protectionist laws from 1920
    • @Joab_Jackson: In the U.S., the time to get to the #cloud (AWS) is about 35ms, or 50ms via 4G mobile (~1ms soon w 5G) —  @avnerbraverman #ServerlessConf
    • @GossiTheDog: Kaspersky alleged hacked by Israel passing info to NSA who noticed they’ve also allegedly hacked by Russian gov investigating GCHQ hacking
    • @Joab_Jackson: #FaaS latency:Azure has the least but varies most;Google slowest;IBM most predictable—you want predictable @AvnerBraverman #ServerlessConf
    • @DanielKrook: IBM Cloud Functions is the most predictable #serverless platform. STD dev of 6ms between function invocation times. #serverlessconf
    • @LawrenceHecht: approximate quote: we love kafka, but had to cross it off the list b/c requires a FTE to manage. @airtasker @_wub #serverlessconf
    • IBM: cryptocurrency mining attacks aimed at enterprise networks jumped sixfold between January and August
    • @shanselman: Company: "Well, only 0.03% of people hit that bug." Me: "100% of people who are me are hitting this bug.
    • Loren Brichter: Smartphones are useful tools. But they’re addictive. Pull-to-refresh is addictive. Twitter is addictive. These are not good things. When I was working on them, it was not something I was mature enough to think about. I’m not saying I’m mature now, but I’m a little bit more mature, and I regret the downsides.
    • DanielBMarkham: It's not that tools are bad. It's that our natural inclination to add in abstractions easily leads to code where it's more important than ever to thoroughly test exactly what the code does. If we focused on the testing part first, the tools part wouldn't be an issue. But instead we focus on tools and schedule pressure, and this leads to total crap. We buy the tools/framework because we believe that schedule pressure forces us to work "at a higher level" but instead that same pressure, combined with the cognitive diffculties of adding yet more layers to the problems leads to a worse state of affairs than if we had simply skipped the tools to begin with. I'll never forget the shocking wakeup I got as a developer when I realized I am a market for people selling me stuff, and these people do not have the interests of my clients in mind. They only have to sell me, not provide value.
    • Maethor_derien: IT could easily deliver large faultless projects as quickly as other industries. If your willing to put the same amount of man hours in as you do in those other projects you easily would have large amazing products. The problem is they want to give IT a quarter of the staff and a quarter of the time as you would for any physical project. They want it done in a year with minimal staffing and wonder why the quality is low compared to something that would spend 4+ years being designed and prototyped before it was produced.
    • Iain McGilchrist: So the meaning of an utterance begins in the right hemisphere, is made explicit (literally folded out, or unfolded) in the left, and then the whole utterance needs to be ‘returned’ to the right hemisphere, where it is reintegrated with all that is implicit – tone, irony, metaphor, humour, and so on, as well as a feel of the context in which the utterance is to be understood.
    • londons_explore: As a former insider, I can tell you likely pre-story which might cast a different light on this: * Google employee comes up with an idea. * They go and research the idea to check if there are any already existing companies which do it. * If any are found, they meet and decide if they should buy the company, reinvent the idea, or that it isn't relevant. * If, after investigation it is determined the company's tech isn't good enough, they will re-invent. Google has fairly strict tech requirements (no php, no shady licenses/ownership, no pirated stuff, etc.), so many companies don't pass. * When they reinvent, they will do it "clean room" - ie. none of the people who reviewed the original company will be involved in the re-invention.
    • tfha: This hopefully throws some context on the scaling debate. The giant fight, a fight from 1MB to 2MB, would allow Bitcoin to scale from 20 million monthly users, to 40 million monthly users. It's not a big upgrade. Any exec at Facebook, Snapchat, Uber, etc. would laugh you out of the room if you suggested that we should have a company-splitting and devastating debate that sidetracks development for 2 years over scaling the platform from 20 million to 40 million users. It just doesn't make sense. Which is one of the big reasons the small blockers resist the 2mb hardfork. For all the pain that this debate has brought about, there's very little upside in the grand scheme of things. We scale Bitcoin from an insignificant number of users to still an insignificant number of users. If we find a scaling solution, it's going to come from somewhere else.
    • apatters: Well, it's not the smartphone that's the problem. The smartphone is just the delivery system. It's remarkable how well some of Richard Stallman's quotes have aged. "With software, either the users control the program, or the program controls the users..." The idea of a program controlling its users must have seemed very esoteric when that quote was first penned in 1985, at a time when home PCs (let alone ones with GUIs) were exotic: the first Mac had launched only a year ago. By the time I first heard of Stallman's ideas in the '90s, I was surrounded by PCs with GUIs, but still didn't get it. Now here we are, 30 years later. The first thing most of us do when we wake up is roll over, grab our phone, and look at some software. The existential costs of the non-free software are so high that we read new stories about them in the media every week, and the tech revolution's architects are banning the products they built from their own homes. For the less frequently cited ending to Stallman's quote is this: "...If the program controls the users, and the developer controls the program, then the program is an instrument of unjust power."
    • @lehtior2: As hackers breached major banks, they'd modify the overdraft (OD) limits of credit cards one by one, and within minutes of each modification, the card would be used in another country to withdraw money from an ATM. In total, at least $40m is suspected to have been stolen.

  • This Tiny Country Feeds the World: The tiny Netherlands has become an agricultural powerhouse—the second largest global exporter of food by dollar value after the U.S...From his perch 10 feet above the ground, he’s monitoring two drones—a driverless tractor roaming the fields and a quadcopter in the air—that provide detailed readings on soil chemistry, water content, nutrients, and growth, measuring the progress of every plant down to the individual potato. Van den Borne’s production numbers testify to the power of this “precision farming,” as it’s known. The global average yield of potatoes per acre is about nine tons. Van den Borne’s fields reliably produce more than 20...Borne and many of his fellow farmers have reduced dependence on water for key crops by as much as 90 percent. They’ve almost completely eliminated the use of chemical pesticides on plants in greenhouses, and since 2009 Dutch poultry and livestock producers have cut their use of antibiotics by as much as 60 percent...Only that mix, “the science-driven in tandem with the market-driven,” he maintains, “can meet the challenge that lies ahead.”...the planet must produce “more food in the next four decades than all farmers in history have harvested over the past 8,000 years.

  • Cool use of AI, saving the world from asteroids. Planetary Defense: Radar 3D Shape Modeling (SETI Talks 2017).

  • Judging by the quantity and quality of tweets coming from #serverlessconf, FaaS is vibrant and growing. On to the next thing. Nice ServerlessConf 2017 Recap. Notes are available for several talks: The State of Serverless Security; 10 tips for running a serverless business... number #6 will blow your mind!; Shipping Containers As Functions; Harmonizing Serverless and Traditional Applications; Break-up with Your Server, but Don’t Commit to a Cloud Platform; Serverless Design Patterns; Event-driven Architectures: are we ready for the paradigm shift?

  • Since Elon is going to Mars, are moonshots big enough anymore? The Secrets of Google’s Moonshot Factory: The United States’ worst deficit today is not of incremental innovation but of breakthrough invention. Research-and-development spending has declined by two-thirds as a share of the federal budget since the 1960s. The great corporate research labs of the mid-20th century, such as Bell Labs and Xerox Palo Alto Research Center (parc), have shrunk and reined in their ambitions. America’s withdrawal from moonshots started with the decline in federal investment in basic science.

  • Google Home Mini review units had a bug that recorded everything, all the time. If they can ship a fix to turn this off then they can ship a "fix" to turn it on too. You'll never know.

  • Hearing some heart breaking stories of people losing their homes in the Santa Rosa fires. Everything is gone. Homes. Schools. Stores. Businesses. One couple was woken up and given 5 minutes notice to evacuate. They didn't have time to get their laptops or backup drives. Everything was on their laptops and they didn't have an offsite backup. So backup! Get those old photos, baby pictures, and vacation movies online. Time is precious. 

  • Glad to see the porn industry is finally catching up. Pornhub Launches AI-powered Model That Detects Over 10,000 Pornstars in Videos Using Computer Vision: Pornhub’s new AI model is fed several thousand videos, in addition to official photos of pornstars, to learn from using computer vision. Then, based on what the model has learned, it scans videos and returns the matching pornstars with a confidence level. Pornhub users then validate what the AI model returns, either upvoting or downvoting the pornstar tags on the video, depending on its validity. Pornhub’s AI model learns based on what the community has validated and continues to get smarter. 

  • Spotify’s Discover Weekly: How machine learning finds your new music: This Monday  over 100 million Spotify users found a fresh new playlist waiting for them. It’s a custom mixtape of 30 songs they’ve never listened to before but will probably love...there are three main types of recommendation models that Spotify employs: Collaborative Filtering models (i.e. the ones that Last.fm originally used), which work by analyzing your behavior and others’ behavior. Natural Language Processing (NLP) models, which work by analyzing text. Audio models, which work by analyzing the raw audio tracks themselves...Spotify doesn’t have those stars with which users rate their music. Instead, Spotify’s data is implicit feedback...Now we‘ve got 140 million user vectors — one for each user — and 30 million song vectors...To find which users have taste most similar to mine, collaborative filtering compares my vector with all of the other users’ vectors, ultimately revealing the most similar users to me...Natural Language Processing (NLP) models source data, as the name suggests, are regular ol’ words — track metadata, news articles, blogs, and other text around the internet...Unlike the first two model types, raw audio models take into account new songs...Convolutional neural networks are the same technology behind facial recognition. In Spotify’s case, they’ve been modified for use on audio data instead of pixels...After processing, the neural network spits out an understanding of the song, including characteristics like estimated time signature, key, mode, tempo, and loudness...Ultimately, this understanding of the song’s key characteristics allows Spotify to understand fundamental similarities between songs and therefore which users might enjoy them based on their own listening history.

  • A handy dandy TCP Throughput Calculator.

  • Seems like Firebase might be the gateway drug to mainlining the Google cloud. Also interesting the don't use separate databases for microservices, Spanner can handle it all. And they don't delete data, Spanner can handle it all. How [Shine] built a brand new bank on GCP and Cloud Spanner: French startup whose platform helps freelancers manage their finances...our first six months, we iterated and validated a prototype app using Firebase, and secured our seed funding round...chose App Engine flexible environment with Google Cloud Endpoints for an auto-scaling microservices API. These helped us reduce the time, effort, and cost in terms of DevOps engineers, so we could invest more in developing features, while maintaining our agility...use Cloud Identity and Access Management (Cloud IAM) to help control developer access to critical parts of the application...we wanted to focus on the app and user experience, not on database administration or scalability issues...Cloud Spanner combines a globally distributed relational database service with ACID transactions, industry-standard SQL semantics, horizontal scaling, and high availability...Cloud Spanner is fast...The first connection to Cloud Spanner takes a long time to initialize, which makes it difficult to expose an API through Cloud Functions...Cloud Spanner allows us to change a schema in production without downtime...We have an instance on which there are three databases -- one for production, one for staging and one for testing our continuous integration (CI) pipeline. Each service has one or more interleaved tables that are isolated from others services’ tables...created an internal query service that performs read-only queries to Cloud Spanner to generate a dashboard or do complex queries for analytic...We take advantage of Cloud Spanner’s scalability, and thus don’t delete any data that could one day be useful and/or profitable...We store all of our business logs on Cloud Spanner.

  • Like big beautiful hunks of engineered metal works of art? Where Turbines Are Born: An Inside Look at GE’s Big Iron Maternity Ward.

  • ORMs, am I right? How I Reduced my DB Server Load by 80%: This post is about me finding and fixing that issue which resulted in a net 80% decrease in my database load...what kind of average load my DB was under. I’m using a standard-0 DB and Heroku lists it as being able to sustain a load of 0.2...my app was spiking up to 2.15...It turns out it was coming from this line in my model. This innocuous little line was responsible for 80% of my total database load. This validates call is Rails attempting to ensure that no two Repo records get created with the same username and name

  • Hazelcast and the Mythical PA/EC System: Over the past few weeks, I have been looking more deeply at the In-Memory Data Grid (“IMDG”) market, and took an especially deep dive into Hazelcast, a ubiquitous open source implementation of a IMDG, with hundreds of thousands of in production deployments. It turns out that Hazelcast (and, indeed, most of the in-memory data grid industry) is a real implementation of the mythical PA/EC system...The vast majority of Hazelcast use cases are within a single computing cluster. Both the client programs and the Hazelcast data structures are located in the same physical region...The bottom line here is that both if the master fails and also in the (rare) case of a network partition, a new master is selected that may not have all the updates from the original master. The system always remains available, but the second master is allowed to temporarily diverge from the original master. Thus, Hazelcast is PA/EC in PACELC...Hazelcast also supports replication of clusters over a WAN. For example, in a disaster recovery use case, all writes go to the primary cluster, and they are asynchronously replicated to a backup cluster. Alternatively, both clusters can accept writes, and they are asynchronously replicated to the other cluster...Hazelcast serves these reads from the closest location, even though it may not have the most up to date copy of the data. Thus, Hazelcast is EL by default for WAN replication. Also, Jepsen Analysis on Hazelcast 3.8.3.

  • BlockChain is the blackhole of weird. When I saw this: BlockChain vs Event Driven Architecture, I thought something like this, but not as creative, Miky: Next in this series: - Deep Learning vs. RESTful APIs - MRI scans vs. the sport of soccer - Listening to music vs. Earth, the planet

  • Forget writing songs, there's much more money in writing books. I Wrote a Hit Song With Justin Bieber. Want to See My Royalties?: Sirius XM, not bad.  1,509 spins, earned him $765 dollars...Pandora: 38,225,700 spins earned him $278...YouTube: 34,220,900 spins earned him $218.17... I want to be the greatest songwriter of all time!  I have the skill set! But the problem is that if streaming is taking over, and the numbers don’t add up…

  • Good comparison. Google Cloud vs AWS in 2017 (Comparing the Giants). They like Google Cloud, but they admit bias.

  • Sharding is just the start, you have to actively manage all those shards. Scalable SQL Made Easy: How CockroachDB Automates Operations: These techniques discussed above allow Cockroach to continuously rebalance and spread ranges to fully utilize the whole cluster. By automating rebalancing, CockroachDB is able to eliminate the painful re-sharding procedure that is even still present in most modern NoSQL databases. Furthermore, by automating repairing and self-healing, CockroachDB is able to greatly minimize the number of emergencies due to failing machines. Since this repairing happens without having to schedule any downtime or run specific repair jobs during slow times it is a huge benefit to anyone trying to keep an important database up and serving load. For another take, here's TiDB: Scale the Relational Database with NewSQL.

  • Nice curated list of 7 New Java Talks You Need to See: Using Java 9 Modules; Java 9 First Impressions; The Hitchhiker’s Guide to Java Class Reloading; 10 Tips For Failing Badly at Microservices; 12 Stories Every Architect Should Know; Java Performance Engineer’s Survival Guide; Controlling Technical Debt With Continuous Delivery

  • In the future not meeting your mate on online will seem a primitive as working in an office or having sex to combine chromosomes. First Evidence That Online Dating Is Changing the Nature of Society: People who meet online tend to be complete strangers. And when people meet in this way, it sets up social links that were previously nonexistent...It is intriguing that shortly after the introduction of the first dating websites in 1995, like Match.com, the percentage of new marriages created by interracial couples increased rapidly...research into the strength of marriage has found some evidence that married couples who meet online have lower rates of marital breakup than those who meet traditionally

  • Good overview with code. Concurrent Servers: Part 3 - Event-driven: This is part 3 of a series of posts on writing concurrent network servers. Part 1 introduced the series with some building blocks, and part 2 - Threads discussed multiple threads as one viable approach for concurrency in the server. Another common approach to achieve concurrency is called event-driven programming, or alternatively asynchronous programming [1]. The range of variations on this approach is very large, so we're going to start by covering the basics - using some of the fundamental APIs than form the base of most higher-level approaches. Future posts in the series will cover higher-level abstractions, as well as various hybrid approaches.

  • Videos from Paper We Love Conf September 28, 2017 in St. Louis.

  • Different I/O Access Methods for Linux, What We Chose for Scylla, and Why: With Scylla, we have chosen the highest performing option, AIO/DIO. To isolate some of the complexity involved, we wrote Seastar, a high-performance framework for I/O intensive applications. Seastar abstracts away the details of performing AIO and provides common APIs for network, disk, and multi-core communications. It also provides both callback and coroutine styles of state management suitable for different use cases...Compaction uses application-level read-ahead and write-behind to ensure high throughout but bypass application level caches due to expected low hit rates...Queries (reads) use application-controlled read-ahead and application-level caching...Small reads are aligned to a 512-byte boundary to reduce bus data transfers and latency...The Seastar I/O scheduler allows us to dynamically control I/O rates for compaction and queries...A separate I/O scheduling class ensures that commitlog writes get the required bandwidth and are not dominated by reads or dominate reads

  • Fascinating history by Stuart Oberman. NVIDIA GPU Computing: A Journey from PC Gaming to Deep Learning. From humble beginnings to tech powerhouse.

  • "The fact that we can do this without having to build so much of the foundation is one of the things that gets me so excited about Kubernetes." Scaling Dedicated Game Servers with Kubernetes: Part 3 – Scaling Up Nodes: Scaling up and down the nodes in a Kubernetes cluster probably makes more sense for a cloud environment, since we only want to pay for the resources that we need/use. If we were running in our own premises, it may make less sense to change the size of our Kubernetes cluster, and we could just run a large cluster(s) across all the machines we own and leave them at a static size...The node scaler essentially runs an event loop to carry out the strategy outlined above. Using Go in combination with the native Kubernetes Go client library makes this relatively straightforward to implement, as you can see below in the Start() function of my node scaler...Once we have deployed our node scaler, let’s tail the logs and see it in action. In the video below, we see via the logs that when we have one node in the cluster assigned to game servers, we have capacity to potentially start forty dedicated game servers, and have configured a requirement of a buffer of 30 dedicated game servers.

  • How Stitch Consolidates A Billion Records Per Day: Our language of choice is Clojure...we use Python for the open source code in the Singer project...we recently introduced React components into our front end, which many of our developers find easier to work with...today we opt to instead write new code in ES6...Postgres for transactional data persistence...we expect our scale requirements for these transactional services to remain quite small, and AWS RDS makes it easy enough for us to operate either...a Clojure web service that accepts JSON-formatted data either point-at-a-time or in large batches. It does a quick validation check on the JSON and an authentication check on the request’s API token before writing the data to a central Kafka queue...a multithreaded Clojure application that writes the data to files on S3 in batches separated by the database table it is destined for...Batches that have been written to S3 enter "the spool" - a queue of work waiting to be processed by one of our “loaders” - Clojure applications that read data from S3 and do whatever processing is necessary before finally loading the data into the customer’s data warehouse...development environment is based on a VirtualBox VM that is configured with the same Chef code we use in production...use GitHub to host our code repositories, and CircleCI automatically runs our test suites before code is merged...Stitch is run entirely on AWS...we also use Redshift as an internal data warehouse...Terraform code for configuring the virtual hardware each runs on...majority of our services run on stateless EC2 instances that are managed by AWS OpsWorks...we use Jenkins along with a custom script to provision instances in those layers with a specific code release...We recently introduced Kubernetes into our infrastructure to run the scheduled jobs that execute Singer code to extract data from various sources...We use Datadog to monitor our application...AWS Elasticsearch service managing the search cluster...PagerDuty to wake us up in the event of a problem.

  • Amazon declares scoreboard on Oracle. Amazon replaced 150 Oracle databases running on 300 hosts with DynamoDB. Oracle hard to install. Oracle hard to scale. Reduced workflow processing latency by 90%. Reduced time needed to scale the system for large events by 90 percent. Now much less time spent maintaining previous solution, which translates into more time spent creating new features that add value for Amazon engineering teams.

  • Netflix with an indepth experience report of deploying serverless at scale on their own infrastructure. They like it, but you have to do things differently. Developer Experience Lessons Operating a Serverless-like Platform At Netflix — Part II: When a script is deployed to our platform, the rollout is completed within a fixed time interval. This predictability is useful, especially for automated workflows that involve a dependency on the script being fully rolled out...we implemented a realtime notification system that allows developers to progressively monitor the state of their deployment...application instances are more vulnerable to cold start delays caused by JIT-ing or connection priming...In the end, we evolved our platform to support both global and regional deployments. Users have the option to choose their deployment schedules by region...As a final gate before production, canary deployments and multi-variate testing are key techniques to gain confidence at scale and reduce the risk associated with a new deployment. These capabilities are built into our deployment and routing layers...we believe that in order to reliably operate applications composed of smaller units, the core concept of increased abstraction should be extended to operational insight and workflows well beyond today’s levels...the key is to allow developers to outsource more of the operations to tools with confidence...We soon realized that manual clean up was not only tedious, but also error prone — versions that were still taking meaningful traffic were sometimes accidentally removed...Developers are asked to specify upfront when a particular version can be safely sunset, based on traffic falling below a threshold for a minimum number of days. Versions that fall below the threshold are automatically cleaned up using an off-band system that evaluates eligibility...Our goal is to apply the updates to an application unit, run it through the canary process and based on the canary score, provide a push button way for the update to be rolled out.

  • Infrastructure as a database asks: So where is the infrastructure as a database movement? Response: A database is just state. So how do you turn database operations into smartly executed operations in the real world. Where is the correct configuration stored? Is it the database or in the hardware? Modeling complex things with complex behaviours as databases is hard to do.

  • Chaos is a ladder. Memristor-Driven Analog Compute Engine Would Use Chaos to Compute Efficiently: For Williams, there’s a bigger lesson in the development of these memristors. “Everyone’s trying to reinvent the transistor using a new material,” he notes. “Even if you made a perfect transistor—whatever that is—you’d still not beat scaled CMOS.” Instead scientists and engineers should be looking for new types of computing from these new materials. “It’s important to ask what the material system is doing that’s different than what a transistor does… Rather than make a bad transistor, see if it makes something that would take 100 or 1,000 transistors to replicate.” Williams and his team are hoping their memristor system does just that.

  • Fun story. Behind the Magic: How we built the ARKit Sudoku Solver: My 2016 Macbook Pro running tensorflow-cpu was outperforming the AWS p2.xlarge GPU instance. My suspicion is that the training runs were being bottlenecked by disk not compute. The cloud instances are also expensive. By my calculations, the payoff period of building my own box would be less than 2 months of cloud run-time. So I built a machine with relatively modest specs for about $1200 and parked it in my parents’ basement. It’s over 3x faster on my dataset than the AWS GPU instance I was experimenting with and should pay for itself soon.

  • Good tutorial with code examples. How we use gRPC to build a client/server system in Go: This post is a technical presentation on how we use gRPC (and Protobuf) to build a robust client/server system.

  • ibm-functions/composer: a new programming model from IBM Research for composing IBM Cloud Functions, built on Apache OpenWhisk. Composer extends Functions and sequences with more powerful control flow and automatic state management. With it, developers can build even more serverless applications including using it for IoT, with workflow orchestration, conversation services, and devops automation, to name a few examples.

  • azuqua/clusterluck: A library for writing distributed systems that use a gossip protocol to communicate state management, consistent hash rings for sharding, and vector clocks for history.

  • google/netstack:  a network stack written in Go.

  • gluon-api/gluon-api (Introducing Gluon: a new library for machine learning from AWS and Microsoft): A clear, concise, simple yet powerful and efficient API for deep learning.

  • JavaScript for Extending Low-latency In-memory Key-value Stores: Large scale in-memory key-value stores like RAMCloud can perform millions of operations per second per server with a few microseconds of access latency. However, these systems often only provide simple feature sets, and the lack of extensibility is an obstacle for building higher-level services. We evaluate the possibility of using JavaScript for shipping computation to data and for extending database functionality by comparing against other possible approaches. Microbenchmarks are promising; the V8 JavaScript runtime provides near native performance with reduced isolation costs when compared with native code and hardware-based protections. We conclude with initial thoughts on how this technology can be deployed for fast procedures that operate on in-memory data, that maximize gains from JIT, and that exploit the kernel-bypass DMA capabilities of modern network cards.

Hey, just letting you know I've written a new book: Explain the Cloud Like I'm 10. It's pretty much exactly what the title says it is. If you've ever tried to explain the cloud to someone, but had no idea what to say, send them this book.

I've also written a novella: The Strange Trial of Ciri: The First Sentient AI. It explores the idea of how a sentient AI might arise as ripped from the headlines deep learning techniques are applied to large social networks. Anyway, I like the story. If you do too please consider giving it a review on Amazon.

Thanks for your support!