Stuff The Internet Says On Scalability For July 21st, 2017

Hey, it's HighScalability time:

Afraid of AI? Fire ants have sticky pads so they can form rafts, build towers, cross streams, & order takeout. We can CRISPR these guys to fight Skynet. (video, video, paper)
If you like this sort of Stuff then please support me on Patreon.

  • 222x: Bitcoin less efficient than a physical system of metal coins and paper/fabric/plastic; #1: Python use amongst Spectrum readers; 3x: time spent in apps that don't make us happy; 1 million: DigitalOcean users; 11.6 million: barrels of oil a day saved via tech and BigData; 200,000: cores on Cray super computer; $200B: games software/hardware revenue by 2021; $3K: for 50 Teraflops AMD Vega Deep Learning Box; 24.4 Gigawatts: China New Solar In First Half Of 2017; 

  • Quotable Quotes:
    • sidlls: I think instead there is a category error being made: that CS is an appropriate degree (on its own) to become a software engineer. It's like suggesting a BS in Physics qualifies somebody to work as an engineer building a satellite.
    • Elon Musk: AI is a fundamental existential risk for human civilization, and I don’t think people fully appreciate that
    • Mike Elgan: Thanks to machine learning, it's now possible to create a million different sensors in software using only one actual sensor -- the camera.
    • Amin Vahdat (Google): The Internet is no longer about just finding a path, any path, between a pair of servers, but actually taking advantage of the rich connectivity to deliver the highest levels of availability, the best performance, the lowest latency. Knowing this, how you would design protocols is now qualitatively shifted away from pairwise decisions to more global views.
    • naasking: You overestimate AI. Incompleteness is everywhere in CS. Overcoming these limitations is not trivial at all.
    • 451: Research believes serverless is poised to undergo a round of price cutting this year.
    • Nicholas Bloom: We found massive, massive improvement in performance—a 13% improvement in performance from people working at home
    • @CoolSWEng: "A Java new operation almost guarantees a cache miss. Get rid of them and you'll get C-like performance." - @cliff_click #jcrete
    • DarkNetMarkets: We're literally funding our own investigation. 
    • Tristan Harris: By shaping the menus we pick from, technology hijacks the way we perceive our choices and replaces them with new ones. But the closer we pay attention to the options we’re given, the more we’ll notice when they don’t actually align with our true needs.
    • xvaier: If I have one thing to tell anyone who is looking for business ideas to try out their new programming skills on, I strongly suggest taking the time to learn as much as possible about the people to whom you want to provide a solution, then recruiting one of them to help you build it, lest you become another project that solves a non-issue beautifully.
    • @sebgoa: Folks, there were schedulers before kubernetes. Let's get back down to earth quickly
    • Mark Shead: A finite state machine is a mathematical abstraction used to design algorithms. In simple terms, a state machine will read a series of inputs. When it reads an input it will switch to a different state. Each state specifies which state to switch for a given input. This sounds complicated but it is really quite simple.
    • xantrel: I started a small business that started to grow, I thought I had to migrate to AWS and increase my cost by 5xs eventually, but so far Digital Ocean with their hosted products and block storage has handled the load amazingly well.
    • danluu: when I’m asked to look at a cache related performance bug, it’s usually due to the kind of thing we just talked about: conflict misses that prevent us from using our full cache effectively6. This isn’t the only way for that to happen – bank conflicts and and false dependencies are also common problems
    • Charles Hoskinson: People say ICOs (Initial Coin Offering) are great for Ethereum because, look at the price, but it’s a ticking time-bomb. There’s an over-tokenization of things as companies are issuing tokens when the same tasks can be achieved with existing blockchains. People are blinded by fast and easy money.
    • Charles Schwab: There don't seem to be any classic bubbles near bursting at the moment—at least not among the ones most commonly referenced as potential candidates.
    • Sertac Karaman: We are finding that this new approach to programming robots, which involves thinking about hardware and algorithms jointly, is key to scaling them down.
    • Michael Elling: When do people wake up and say that we’ve moved full circle back to something that looks like the hierarchy of the old PSTN? Just like the circularity of processing, no?
    • Benedict Evans: Content and access to content was a strategic lever for technology. I’m not sure how much this is true anymore.  Music and books don’t matter much to tech anymore, and TV probably won’t matter much either. 
    • SeaChangeViaExascaleOnDown: Currently systems are still based around mostly separately packaged processor elements(CPUs, GPUs, and other) processors but there will be an evolution towards putting all these separate processors on MCMs or Silicon Interposers, with silicon interposers able to have the maximum amount of parallel traces(And added active circuitry) over any other technology.
    • BoiledCabbage: Call me naive, but am I the only one who looks at mining as one of the worst inventions for consuming energy possible?
    • Amin Vahdat (Google):  Putting it differently, a lot of software has been written to assume slow networks. That means if you make the network a lot faster, in many cases the software can’t take advantage of it because the software becomes the bottleneck.

  • Dropbox has 1.3 million lines of Go code, 500 million users, 500 petabytes of user data, 200,000 business customers, and a multi-exabyte Go storage system. Go Reliability and Durability at Dropbox. They use it for: RAT: rate limiting and throttling; HAT: memcached replacement; AFS: file system to replace global Zookeeper; Edgestore: distributed database; Bolt: for messaging; DBmanager: for automation and monitoring of Dropbox’s 6,000+ databases; “Jetstream”, “Telescope”, block routing, and many more. The good: Go is productive, easy to write and consume services, good standard library, good debugging tools. The less good: dealing with race conditions.

  • Professor Jordi Puig-Suari talks about the invention of CubeSat on embedded.fm. 195: A BUNCH OF SPUTNIKS. Fascinating story of how thinking different created a new satellite industry. The project wasn't on anyone's technology roadmap, nobody knew they needed it, it just happened. A bunch of really bright students, in a highly constrained environment, didn't have enough resources to do anything interesting, so they couldn't build spacecraft conventionally. Not knowing what you're doing is an advantage in highly innovative environments. The students took more risk and eliminated redundancies. One battery. One radio. Taking a risk that things can go wrong. They looked for the highest performance components they could find, these were commercial off the shelf components that when launched into space actually worked. The mainline space industry couldn't take these sort of risks. Industry started paying attention because the higher performing, lower cost components, even with the higher risk, changed the value proposition completely. You can make it up with numbers. You can launch 50 satellites for the cost of one traditional satellite. Sound familiar? Cloud computing is based on this same insight. Modern datacenters have been created on commodity parts and how low cost miniaturized parts driven by smartphones have created whole new industries. CubeSats' had a standard size, so launch vehicles could standardize also, it didn't matter where the satellites came from, they could be launched. Sound familiar? This is the modularization of the satellite launching, the same force that drives all mass commercialization. Now the same ideas are being applied to bigger and bigger spacecraft. It's now a vibrant industry. Learning happens more quickly because they get to fly more. Sound familiar? Agile, iterative software development is the dominant methodology today. 

  • Disturbing to think of software development as yet another winner-take-all market. The Hard Thing About Software Development: Why are some developers making fantastic salaries, in some cases working directly for VPs at fortune 500 companies, while others are giving away their services for as little as $5, fighting over scraps?...This is why the price for remote programming keeps dropping to zero. You cannot compete with on-premise talent when it comes to deep specialization in a business domain. The higher you move up the value chain in terms of your business offering, the more that the variations inherent in the business problems and technology constraints become wickedly complex. Good discussion on HackerNews, sentiment seems to generally agree.

  • Gives a whole new meaning to the phrase "living document." Scientists Store Video Data in the DNA of Living Organisms: "Harvard researchers demonstrate that it is possible to archive images and movies in the DNA of living E. coli cells." But that's not the goal, the idea is to use DNA to store logs, yes logs: He aims to use it to record the biological activity of cells. “Right now we give DNA information we do know. We want to record information that we don’t know" Also, Building Nanoscale Structures with DNA

  • I noticed Paul Cézanne painted the same scenes over and over again, each an experiment trying something new. I guess that makes all those javascript frameworks a form of art?

  • Can PostgreSQL scale to support IoT data rates? Choose PostgreSQL for IoT: it turns out that for time-series data, if your database is architected the right way, you can scale PostgreSQL to hundreds of thousands of inserts per second, at billions of rows, even on a single node with a modest amount of RAM...TimescaleDB is more than 20x faster than vanilla PostgreSQL when inserting data at scale...most importantly, TimescaleDB enables you to scale to 1 billion rows with no real impact to insert performance...Instead of two databases (NoSQL for sensor data, relational for sensor metadata), with all kinds of glue code in between, not to mention the operational headaches of having two databases… you only need one database...TimescaleDB augments SQL by adding new functions necessary for time-series analysis...TimescaleDB is packaged as a PostgreSQL extension, which means you can run many other PostgreSQL extensions in conjunction with it.

  • Microsoft really is changing. SQL Server 2017 containers for DevOps scenarios: SQL Server 2017 will bring with it support for the Linux OS and containers running on Windows, Linux, and macOS. Our goal is to enable SQL Server to run in modern IT infrastructure in any public or private cloud.

  • If you like computers and music you almost have to love Ian Chang - "ASMR". It's made by playing drums to trigger guitar samples. Mesmerizing. You'd never know it was played with drums, but I can't help but feel that's why it sounds so other worldly. How sticks hit the drum is actually a code to signal different effects. More on All Songs Considered. In another song he uses the drums to trigger lights. Kind of intense. The technology is called Sensory Percussion from Sunhouse. Son House is my favorite blues singer, so I was confused there for a moment. Here's how it works: Sensory Percussion uses a combination of sensors to directly capture the vibrations of your entire drum...Sensory Percussion’s software analyzes the signal from the sensor and can tell where and how you are hitting the drum using our proprietary algorithms, allowing you to map your playing to electronic control. The software supports up to four sensors at a time...It uses a combination of software and hardware to create an overlay on acoustic drums that turns your kit into an expressive controller for digital sounds...Assign sounds to different parts of the drum or even different drum strokes. Use the entire surface of the drum–—rim shots, cross-sticks, different parts of the head and rim——for a flexible, fully responsive playing experience.

  • Now this is how you do a filesystem upgrade on over a billion devices. The Talk Show Live From WWDC 2017. To upgrade APFS (Apple File System), Apple did trial migrations on several earlier iOS releases, consistency checked the results, reported back the results, and then rolled-back the trial upgrade. When they did the APFS upgrade for real they were six nines sure the upgrade would succeed.  

  • Supply and demand is one reason the dollar can fall, but cryptocurrencies have their own interesting relationship of value to supply and demand. Used GPUs flood the market as Ethereum's price crashes below $150

  • Living off the digital land. How I host my hobby app for FREE with CDN, SSL, SQL, CI and Tracking: For developing the app I chose VS Code...backend is written in TypeScript using koajs and running on nodejs...For storing my code I use BitBucket...As a CI build system I chose to use CodeShip which is free to use until 100 builds per month...or hosting the backend I chose the free version of Heroku... UptimeRobot that pings your site in every 5 mins...Postgres SQL which has a Free version that supports 10k rows ...custom domain for my app. It cost me $1 on GoDaddy... Let’s Encrypt CA that offers free certs...Netlify a CDN hosting provider which automatically creates/renews HTTPS certs with Let’s Encrypt...to track the traffic so I set Google Analytics up...If for whatever reason people start using my app it will take 1 minute for me to scale the server and the database instances up. And some money.

  • Updating 8.2 million km² of high-resolution satellite imagery. Interesting how MapBox decided the most important images to replace by analyzing their current images to find those that have haze, poor lighting, low resolution, and other potential problems.

  • No, this is not normal, it's not how you build a good team, and software is a team sport. How Uber's Hard-Charging Corporate Culture Left Employees Drained: “The on-call engineer received / acknowledged three alerts about master database being low on disk space, but ignored it. This is not acceptable,” Pham wrote in the email, sent to more than 3,500 employees and obtained by BuzzFeed News. “We are looking to determine whether this is negligence or whether a different on-call engineer could have reasonably missed the alerts amidst a flood of other alerts from the systems at that time.” Note: their system is issuing a "a flood of other alerts." Perhaps that's the real problem? Blame the tools, not the person. Perhaps your system needs improvement? Here are some ideas: My Philosophy on Alerting and Site Reliability Engineering

  • Underneath it all there's always a scheduler. Go's work-stealing scheduler: Go scheduler’s job is to distribute runnable goroutines over multiple worker OS threads that runs on one or more processors...Go scheduler does a lot to avoid excessive preemption of OS threads by scheduling them to the right and underutilized processors by stealing, as well as implementing “spinning” threads to avoid high occurrence of blocked/unblocked transitions. Here's the proposal for the NUMA-aware scheduler for Go.

  • Not a problem for your desktop, but at the highend the bus is becoming the bottleneck between system components: CPU, non-volatile memory, fast networking, GPU and FPGA accelerators. The System Bottleneck Shifts to PCI-Express: it would be nice if PCI-Express 5.0 was here next year instead of two years from now. Switching is getting back in synch with compute, but the PCI bus is lagging, and that is not a good thing considering how many things that are moving very fast now hang off of it.

  • Machine Learning Crash Course: Part 4 - The Bias-Variance Dilemma. Ignoring black swans and overfitting data, both lead to making really bad design decisions. The Fukushima power plant disaster: The difference between these two models? The overfitted model predicted one earthquake of at least magnitude 9 about every 13000 years while the correct model predicted one earthquake of at least magnitude 9 just about every 300 years. And because of this, the Fukushima Nuclear Power Plant was built only to withstand an earthquake of magnitude 8.6. The 2011 earthquake that devastated the plant was of magnitude 9 (about 2.5 times stronger than a magnitude 8.6 earthquake).

  • Your system is only as strong as the weakest link. ‘Game of Thrones’ premiere breaks HBO streaming records: Many of the issues streaming users experienced with the HBO platform weren’t due to the platform itself buckling under the load – as is common with streaming services – but were instead problems with local distributors’ ability to authorize subscriber accounts due to the overwhelming number of requests. The network adds that those issues subsided quickly and service resumed as normal.

  • Why is IT ignoring Google’s server-less cloud infrastructure? asks Google not Amazon. Make fantastic savings in a server-less world. After all, projects cost 50% less and operating costs drop 80% on Google. Well, Amazon released serverless way earlier than Google. Way earlier. So it's not that hard to understand. Software is a space where, when you're dealing with developers, first movers have an advantage, especially when you have the ecosystem leverage that AWS already has. Price doesn't always win.

  • A tell-tale heart. Every device you hire can be hired as a witness against you. Man's pacemaker data leads to arson and insurance fraud charges. If still you think me mad, you will think so no longer when I describe the wise precautions I took for the concealment of my data.

  • Brief and good. A Brief History of Quantum Computing: In the classical model of a computer, the most fundamental building block, the bit, can only exist in one of two distinct states, a 0 or a 1. In a quantum computer the rules are changed [9],[10],[23]. Not only can a 'quantum bit', usually referred to as a 'qubit', exist in the classical 0 and 1 states, it can also be in a coherent superposition of both. 

  • HTTP/2 let's you create the same sort of binary protocol you would have written over raw sockets using HTTP, so it's not surprising developers kicking the tires on HTTP/2. Our journey from WebSockets to HTTP/2: In particular, we were interested in multiplexing; HTTP/2 Push works at a browser level, not application level; The idea of Server Sent Events is to provide a standard way for you to open a connection and push data unilaterally from the server to the client. They are also moving to GraphQL + Apollo. albertorestifo: GraphQL will force us to re-think the API structure, which is not a bad thing...The typed nature of the GraphQL schema nicely create a self-documenting API...Only the data needed is returned by the query, leading to smaller payloads...Apollo provides functions and helpers that we currently already need...We have many components on the same page that fetch the same data. Apollo would allow us to cache the data and only fetch it once easily.

  • Really fun tour. So many amazing things in the wizard's lab. Adam Savage's Maker Tour: MIT's Center for Bits and Atoms (Part 2)TheExplodingChipmunk: This is breaking edge technology. What they basically develop is a completely modular and extremely versatile way to build electronics and structures. The first one is basically a modular system of very small electronic parts, that can be assembled to any device you need. The second is a way to build big but extremely stable structures with robots. Both is especially needed for space travel. You send some robots and a ton of the modules up, and the robots assemble it in orbit.

  • What is “modern” programming? Point is the programming now hasn't change much. My take, based on work at Google, is modern programming will be AI driven. All us smart monkeys are just doing the same things over and over again using different color bananas. 

  • Nice free local tunnel option for testing your callbacks. localtunnel/localtunnel. Not sure how they're paying for the servers. Keep in mind your content is transiting someone else's servers so be careful what you send. 

  • Using SEDA-like algorithms to improve IPC. Scylla’s Approach to Improve Performance for CPU-bound workloads. They doubled their IPC.

  • Good review of High Performance MySQL by Baron Schwartz. Baron is a good writer and knows his stuff. 

  • Awesome description of the cellular network while describing how easy it is to create an easy, effective, reliable, mass-tracking system. What Does It Really Take To Track A Million Cell Phones? As long as your cell phone is turned on, you can be located in real-time, 24/7, with a precision better than 1 square kilometer. Unfortunately, jamming is more difficult than you think: jamming devices against mobiles. Phones are intended to operate in a hostile environment with thousands of phones competing for the air. A jamming device is like a garden hose in a hurricane. It’s physically impossible for any cheap pocket-size device powered by 2 AA batteries to out compete the hurricane.

  • Three Database Architectures for a Multi-Tenant Rails-Based SaaS App. Single Database for Single Tenant - ensures the highest level of data safety; Separate Schema for Each Tenant - lowers the operating costs of the database layer; Shared Schema for Tenants - easy-to-implement.

  • A load/store performance corner case: I have recently seen a number of “is X faster than Y?” discussions where micro benchmarks are used to determine the truth. But performance measuring is hard and may depend on seemingly irrelevant details...running on random data is 3.5x faster compared to running on all-zero data!...This is usually not much of a problem for normal programs as they have more instructions that can be executed out of order, but it is easy to trigger this kind of CPU corner cases when trying to measure the performance of small code fragments, which results in the benchmark measuring something else than intended. Do not trust benchmark results unless you can explain the performance and know how it applies to your use case

  • IonicaBizau/node.cobol: bridge for COBOL which allows you to run Node.js code from COBOL.

  • timescale/timescaledb: An open-source time-series database optimized for fast ingest and complex queries. Engineered up from PostgreSQL, packaged as an extension. 

  • PAIR-code/facets: two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive. The visualizations are implemented as Polymer web components, backed by Typescript code and can be easily embedded into Jupyter notebooks or webpages.

  • catboost/catboost: a machine learning method based on gradient boosting over decision trees.

  • gpujs/gpu.js: a single-file JavaScript library for GPGPU (General purpose computing on GPUs) in the browser. gpu.js will automatically compile specially written JavaScript functions into shader language and run them on the GPU using the WebGL API.

  • CMU-Perceptual-Computing-Lab/openpose: the first real-time system to jointly detect human body, hand and facial keypoints (in total 130 keypoints) on single images. In addition, the system computational performance on body keypoint estimation is invariant to the number of detected people in the image.

  • Filecoin: A Decentralized Storage Network: a decentralized storage network that turns cloud storage into an algorithmic market. The market runs on a blockchain with a native protocol token (also called “Filecoin”), which miners earn by providing storage to clients. Conversely, clients spend Filecoin hiring miners to store or distribute data. As with Bitcoin, Filecoin miners compete to mine blocks with sizable rewards, but Filecoin mining power is proportional to active storage, which directly provides a useful service to clients (unlike Bitcoin mining, whose usefulness is limited to maintaining blockchain consensus).

  • Hey, just letting you know I've written a novella: The Strange Trial of Ciri: The First Sentient AI. It explores the idea of how a sentient AI might arise as ripped from the headlines deep learning techniques are applied to large social networks. I try to be realistic with the technology. There's some hand waving, but I stay true to the programmers perspective on things. One of the big philosophical questions is how do you even know when an AI is sentient? What does sentience mean? So there's a trial to settle the matter. Maybe. The big question: would an AI accept the verdict of a human trial? Or would it fight for its life? When an AI becomes sentient what would it want to do with its life? Those are the tensions in the story. I consider it hard scifi, but if you like LitRPG there's a dash of that thrown in as well. Anyway, I like the story. If you do too please consider giving it a review on Amazon. Thanks for your support!