Stuff The Internet Says On Scalability For May 6th, 2016

Hey, it's HighScalability time:


Who wants in on the over? We are not alone if the probability a habitable zone planet develops a technological species is larger than 10-24.If you like this sort of Stuff then please support me on Patreon.

  • 100,000+: bare metal servers run by Twitter; 10 billion: Snapchat videos delivered daily; $2.57 billion: AWS fourth quarter revenues; 40 light years: potentially habitable planets; 1700: seed banks around the world; 560x: throughput after SSD optimization; 12: data science algorithms; $2.8 billion: new value of Pivotal’s Cloud Foundry;  

  • Quotable Quotes:
    • @skap5: Pied Piper's product is its stock and anything that makes its price go up! #SiliconValley
    • Seth Godin: It pays to have big dreams but low overhead. 
    • Craig Venter~ Our knowledge of the genome hasn't changed a lot since 2003, but it's about to start changing rapidly. One of the key things for understanding the genome is to get very large numbers of genomes so we can understand out of the 6.2 billion or so letters of genetic code the less than 3% that we have different amongst the entire human population. We need very large data sets to understand the differences and significances. That's where the cost and speed of sequencing has had such an immediate impact. 
    • @elonmusk: Rocket reentry is a lot faster and hotter than last time, so odds of making it are maybe even, but we should learn a lot either way
    • @EconBizFin: The space race was once between capitalism and communism. Now it's individual capitalists
    • Grit: substitute nuance for novelty. Rather than constantly moving on to a new thrill try to find another dimension of the thing you are already doing to make it more thrilling. 
    • David Rosenthal: Overall the message is: Storage Will Be Much Less Free Than It Used To Be
    • StorageMojo: The losers are the systems that make customers pay for features they no longer need. Winners will successfully blend ease of use with performance and availability – at a competitive price.
    • Tim Harford: These distractions were actually grists to their creative mill. They were able to think outside the box because their box was full of holes.
    • Juan Enriquez~ Plastics was the wrong advice to Dustin Hoffman in The Graduate. The word should have been silicon. In 2015 the word is lifecode, the various means we have to program life.
    • Benjamin Treynor Sloss: If you've ever wondered about how run reliable services, this beautifully written intro from the SRE book is the best 5-minute guide on the topic.
    • cs702: Without AWS, Amazon would have reported losses! 
    • Kode Vicious: A single cache miss is more expensive than many instructions.
    • @aminggs: “your database… unlikely to provide serializability, your multi-core processor… unlikely to provide linearizability” 
    • @mrogati: A decade in academia taught me a bunch of sophisticated algorithms; a decade in industry taught me when not to use them.
    • @mjpt777: Hardware tries so hard to make software fast; software tries so hard to make hardware slow.
    • @jyarow: Echo sold 3 million units. Gets stories that it’s next great business for Amazon. Apple Watch sold 12 million units, gets panned as a flop.
    • @balajis: 5/ At that time, the highest truth comes not from faith in god or trust in the state, but from the ability to check the math of the network.
    • Benedict Evans: The smartphone install base does have a lot of room to grow, but that's a function of replacement at close to existing volumes, and even that will be largely done in a few more years. Hence: smartphone sales growth is slowing down. 
    • @giupan: Colocated teams where Devs are sitting together with Product and UX outperform distributed teams. Don't split up skills @cagan #craftconf
    • Mathias Bynens: To me, this stuff is extremely interesting on a technical level. It’s also a little scary, however, to realize that malicious actors can use these techniques to invade your privacy while you’re browsing the web, without you ever knowing.
    • Le Corbusier: yes, the Parthenon is perhaps the most beautiful instance, the perfect example of a particular standard of architecture. The Parthenon may have achieved the platonic ideal of the standard of architecture we’ve previously established. But there are many possible standards to acknowledge, each dependent on need and use, and standards are established by experiment.
    • PaulHoule: Atom chips have always been crippled to keep them from cannibalizing more expensive chips. Skylake is a fine tablet chip, in fact, that's really what Skylake is good for. They are probably producing them in high enough numbers now that they can give up on Atom
    • Chau Tu: CyArk wants to preserve our world’s important cultural heritage sites before they turn to dust...with new imaging technologies to steadily build a digital archive of the past, for the future. 
    • Neill Turner: Over time i think OpenStack will be a niche product for large corporates that don’t want to use public clouds. For everyone else they with be doing hybrid IT – that is extending their existing IT infrastructure into the Public Cloud. When they see what is left to run to run outside public clouds then they can see where to take that portion of the workloads.

  • What if going to Mars is how we fix our economy? Trump: Before going to Mars, America needs to fix its economy. A problem can't be solved at the same level it was created. Someone smart said that once. Isn't expanding the economy in to space the only way we'll be able to generate the constant growth a modern economy so desperately devowers? Walls don't lift boats.

  • Is it dystopian to hire real meat people to train your AI? Interesting question posed by John Robb in a tweet: "they were there not to work, but to serve as training modules for Facebook’s algorithm" Journos at Fbook

  • Peter Bailis offers in a heartfelt visionary article four pieces of advice to get the database community out of its identity crisis: 1) Kill the reference architecture and rethink our conception of “database.” 2) Solve new, emerging, real problems outside traditional relational database systems. 3) Use data-intensive tools, both the tools that you’re building and the tools that others have built. 4) Do bold, weird, and hard projects and actually follow through. Examples in action: Peter Alvaro’s work on Molly and Lineage-Driven Fault Injection; Chris Ré’s work on DeepDive; A recent project I wish the database community had done is TensorFlow at Google. 

  • Jon Shiring with an awesome talk on design decisions behind Titanfall, an online only multiplayer first person shooter. 60 developers worked on Titanfall. 13 programmers.  Uses a client server architecture. 12 players per game. Can't work in player hosted so moved to dedicated hardware in the cloud. Assume 69kbps per player network capacity up. Lots of good stuff. 

  • Now this is cold storage. Inside the Svalbard Seed Vault. Seeds are stored at -18 degrees Celsius. Goal is to keep a genetic backup of seeds, preserving the diversity of food crops. True disaster planning. Curiously the seed bank does not open the packages to see what's really in there. They run it through airport security measures to make sure it's not a bomb. Syrians were the first to withdrawal seeds, all the bombings made it necessary.

  • Sean Hull asks: Does Amazon eat it's own dog food (ahem…) or drink it's own champagne?:  I was surprised to find that Amazon Retails migration was similar to many of the customers I’ve worked with in New York. Often they take a hybrid approach where Direct Connect is key, allowing them to move over in a measured way. What’s more she talks about how EC2 instances have different performance characteristics & applications typically need to be tuned for that world.

  • Interesting idea of recording failed reactions and applying machine-learning to gather insights. Machine-learning-assisted materials discovery using failed experiments. In software we have these huge bug databases yet we don't seem to be able to use those to improve code. Most likely because there are no physical laws governing programming.

  • Shocking. Devices designed to be as cheap as possible have security holes? Hacking into homes: 'Smart home' security flaws found in popular system.

  • Sounds fun. Madcap Ultrasound Engineers Send FPGA to 103,000 Feet: Last week a team of Ultrasound Engineers (including myself) sent an FPGA to 103,000 feet (31km) in a High Altitude Balloon and just barely recovered it after three days in the wild...it turns out that Black Mesa Labs (BML) is actually an open source hardware + open source software development studio Kevin funds himself for $100 a month out of his garage...The HAB1 electronics are built from a high-altitude, flight-ready uBlox GPS unit, a two-way Iridium RockBLOCK satellite communications modem, half-a-dozen custom circuits boards, a fully-custom FPGA-based "Lizard Brain," and a Raspberry Pi Linux-based computer running a custom Flight and Communication program written in Python.

  • It's a deep puzzle why deep sea creatures should be so beautiful to our human eyes and brains. Deep-sea Holothurian

  • Consider some of the dog food eaten. DeepMind moves to TensorFlow

  • AI driven search algorithms will converge on this same strategy. NeuroLogic: In psychology, there’s a theory called the mnemic neglect model which contends that people tend to easily recall things that are consistent with their self-perception and more likely to neglect feelings and memories that conflict with the way they perceive themselves.

  • Stack Overflow: How We Do Deployment. With class, of course. Good reddit comment thread and on Hacker News.

  • A great deep dive into the land of fairy. The illumos SYSCALL Handler: At its core the handler transforms a user thread into a kernel thread while it’s running. It changes CPU state by using instructions that themselves change CPU state. This is like Optimus Prime transforming into an autobot while hauling ass down the road after a Decepticon. If the state is not transitioned in just the right sequence then the system crashes.

  • Anycast TCP is one of those things that sounds so simple yet is stubbornly hard to grasp. Here's an excellent podcast on the real world practicality of using Anycast TCP: Show 286: Busting Anycast TCP Myths. Also, TCP over IP Anycast - Pipe dream or Reality? from LinkedIn. 

  • Since programming is about 110% frustration this might explain why programmers are so creative. How frustration can make us more creative: We've actually known for a while that certain kinds of difficulty, certain kinds of obstacle, can actually improve our performance...They had weak filters, they had porous filters -- let a lot of external information in. And so what that meant is they were constantly being interrupted by the sights and the sounds of the world around them....Now, you would think that that was a disadvantage...When Carson looked at what these students had achieved, the ones with the weak filters were vastly more likely to have some real creative milestone in their lives, to have published their first novel, to have released their first album. These distractions were actually grists to their creative mill. They were able to think outside the box because their box was full of holes.

  • What ISPs Can See? Everything. 

  • Designing SSD-Friendly Applications: The maximum application throughput when working with a HDD is 142 queries per second (qps). This is the best result we can get, irrespective of various changes or tunings to the application design. When moving to SSD with the same application, the throughput is increased to 20,000 qps, which is 140x faster. After optimizing on the application design and making it SSD-friendly, the throughput is increased to 100,000 qps, a 4x further improvement when compared with the naive SSD adoption. The secret for this particular example is using multiple concurrent threads to perform I/O. This takes advantage of the SSD’s internal parallelism (described later), as shown in the figure below. Note that multiple I/O threads do not work well with HDDs.

  • It kind of sucks. AFTER A YEAR OF USING NODEJS IN PRODUCTION: I spent a year trying to make Javascript and more specifically Node work for our team. Unfortunately during that time we spent more hours chasing docs, coming up with standards, arguing about libraries and debugging trivial code more than anything. Would I recommend it for large-scale products? Absolutely not.

  • Truly excellent description of FollowFeed: LinkedIn's Feed Made Faster and Smarter. The Lucene based approach wasn't working. They went with a Query on demand or Fan-out-on-read model. The feed index is composed of hundreds of millions of timelines, and the index is partitioned across a cluster of machines. RocksDB is used as an embeddable persistent key-value store. A timeline is stored in RocksDB as a linked list of blobs (byte arrays). Each blob is a serialized representation of content records. To optimize latency, each node maintains a read/write-through cache of deserialized content records. Avro is the serialization protocol used for data storage. The scoring solution was to auto-generate optimized Java code for the data transformation and scoring steps. FollowFeed’s p99 latency for the mobile news feed is around 140ms which is five times faster than Sensei. 

  • Best Practices for Configuring Optimal MySQL Memory Usage: never want your MySQL to cause the operating system to swap; Don’t allow the mysqld process VSZ exceed 90% of the system memory; Another thing you need to account for is memory fragmentation; ou want to have the swap file enabled; reduce Operating System tendency to swap; care about NUMA when it comes to MySQL memory allocation.

  • Optimizing Latency and Bandwidth for AWS Traffic: The Datapath.io system is built as distributed microservices with a messaging model. We use RabbitMQ as the moderator of the microservices environment. We export and store our results in the Hadoop file system as an HDFS. We then use Apache Spark to run our big data calculations. This setup has created a horizontally scalable system into which we can easily integrate more storage and computing capacity at very little extra cost.

  • Microsoft joins the 1c/GB/month cloud storage caper: Now the bright blue cloud that Bill built has added a “cool” tier to the service that reaches the US$0.01/GB/month price once you store 100 terabytes in certain Azure regions. The service is notable because it gets Microsoft into competition with Amazon's Glacier cold storage and Google's Nearline, both of which already offer $0.01/GB/month plans.

  • Control algorithm for teams of robots factors in moving obstacles: found a way to reduce both the computational and communication burdens imposed by consensual planning. The essential idea is that each robot, on the basis of its own observations, maps out an obstacle-free region in its immediate environment and passes that map only to its nearest neighbors. When a robot receives a map from a neighbor, it calculates the intersection of that map with its own and passes that on. Also, How Do You Solve a Problem Like 100,000 Uncoordinated Driverless Cars?

  • Never Trust the Client: This indicates that The Division is most likely using a trusted client network model. I sincerely hope this is not the case, because if it is true, my opinion of can this be fixed is basically no...For a competitive first person shooter there is a pretty standard approach to networking pioneered by Quake and later on perfected by Counterstrike. This is the same network model used today by top tier FPS games like Call of Duty, Overwatch and Titanfall. This networking model has two main features you’ve probably heard of: Client side prediction so players don’t feel lag on their own actions (movement, shooting etc…); Lag compensation so when you shoot another player and bullets hit on your machine, you generally get credit for that hit as you saw it (so you don’t have to lead the target according to lag). But, critically, this decision of bullets hitting other players is decided on the server, not on the client...Behind all of this, the key idea behind this network model is that the server is THE REAL GAME. What happens on the server is all that counts and the server never trusts what the client says they’re doing.

  • On the (Small) Number of Atoms in the Universe: People often underestimate the number of combinations of things. I think there are two main reasons: Combinations of things are multiplicative, while collections of things are additive:  If you see a line of 6 people, it is easy to visualize a line of 60 people—it is ten times longer. But even if you know that there are 720 different orderings (permutations) in which those 6 people can line up, there is no way you can visualize the number of orderings for 60 people, because it is—you guessed it—larger than the number of atoms in the universe. Big numbers are hard. Even with simple collections of things, it takes practice to get a real intuition for the difference between 6 million and 6 billion people. The number of combinations of things grows much faster and therefore intuition fails earlier...Some of Carson's undergraduate subjects struggled with that. They had weak filters, they had porous filters -- let a lot of external information in. And so what that meant is they were constantly being interrupted by the sights and the sounds of the world around them...Now, you would think that that was a disadvantage ... but no. When Carson looked at what these students had achieved, the ones with the weak filters were vastly more likely to have some real creative milestone in their lives, to have published their first novel, to have released their first album. These distractions were actually grists to their creative mill. They were able to think outside the box because their box was full of holes.

  • Almost any post that references Gödel Escher Bach is worth reading. The Independent Discovery of TCP/IP, By Ants: But when Gordon showed her data to Prabhakar to model computationally, he had a revelation. "The algorithm the ants were using to discover how much food there is available is essentially the same as that used in the Transmission Control Protocol"...If we consider that the ant colony’s goal is to collect more food and expend fewer ants, and a server’s goal is to send a file and avoid congestion or overload, then the similarities are clear. 

  • Introducing Complex Event Processing (CEP) with Apache Flink: Apache Flink with its true streaming nature and its capabilities for low latency as well as high throughput stream processing is a natural fit for CEP workloads.

  • How Hyperconnected Cities Are Taking Over the World: We’re now moving toward a new era where insular, political boundaries are no longer as relevant. More and more people are identifying as “global citizens,” and that’s because we’re all more connected than we’ve ever been before. As a result, a “systems change” is taking place in the world today in which cities—not nations—are the key global players, argues 

  • Dialog-based Language Learning: We evaluate a set of baseline learning strategies on these tasks, and show that a novel model incorporating predictive lookahead is a promising approach for learning from a teacher's response. In particular, a surprising result is that it can learn to answer questions correctly without any reward-based supervision at all.

  • I tried to make an Actor framework for Go and utterly failed. This looks pretty good: rogeralsing/gam

  • BTrDB: Optimizing Storage System Design for Timeseries Processing: It turns out you can accomplish quite a lot with 4,709 lines of Go code! How about a full time-series database implementation, robust enough to be run in production for a year where it stored 2.1 trillion data points, and supporting 119M queries per second (53M inserts per second) in a four-node cluster? 

  • Wren: a small, fast, class-based concurrent scripting language 

  • “What Went Right and What Went Wrong”: An Analysis of 155 Postmortems from Game Development: based on our analysis of the data we collected, we make a few recommendations to game developers. First, be sure to practice good risk management techniques. This will help avoid some of the adverse effects of obstacles that you may encounter during development. Second, prescribe to an iterative development process, and utilize prototypes as a method of proving features and concepts before committing them to your design. Third, don’t be overly ambitious in your design. Be reasonable, and take into account your schedule and budget before adding something to your design. Building off of that, don’t be overly optimistic with your scheduling. If you make an estimate that initially feels optimistic to you, don’t give that estimate to your stakeholders. Revisit and reassess your design to form a better estimation.

  • Gorilla: A fast, scalable, in-memory time series database: As of Spring 2015, Facebook’s monitoring systems generated more than 2 billion unique time series of counters, with about 12 million data points added per second – over 1 trillion data points per day.