hot links

Stuff The Internet Says On Scalability For May 29th, 2015

High Scalability

29 May 2015 — 8 min read

Hey, it's HighScalability time:

Just imagine. 0-100 mph in 1.2 seconds. Astronaut's view from the Dragon spacecraft.

$850B: mobile web market in 2018; 107: unicorns; 3.2 billion: # of people on the Internet; 10^82: atoms in the observable universe
Quotable Quotes:
- @cloud_opinion: appropriate term for people that resist Docker is "VM Huggers"
- @mikeloukides: Scale systems, not teams. Adding scale shouldn’t mean adding people. Teams should scale sublinearly. @shinynew_oz @ #velocityconf
- Marc Levinson: If the market repeatedly misjudged the container, so did the state. Governments in New York City and San Francisco ignored the consequences of containerization as they wasted hundreds of millions of dollars reconstructing ports that were outmoded before the concrete was dry
- @corbett: doesn't describe ultimate origin but "Inflation describes how the universe emerges from a patch of 10^-28cm & mass of only a few grams" -AG
- @Gizmodo: Since last year, over 600 million more people have smartphones. It’s the age of mobile, says Sundar Pichai. #io15
- @stshank: Android in a nutshell: >1 billion users, 4000 devices, 500 carriers, 400 device makers says @sundarpichai at #io15
- Carlos C: Congratulations, FP hackers. You won the battle of simplicity to express...and here is where Go wins the battle of simplicity to achieve.
- @markimbriaco: @joestump In my day, we emitted HTML from our apps. Pushed the packets uphill to the browsers. Through driving DDoS. And we liked it.
- aikah: Yep, hail "Isomorphic micro-service oriented management."
- @bitfield: "We haven't got time to automate this stuff, because we're too busy dealing with the problems caused by our lack of automation." —Everyone
- @raju: India reported 851 Million active mobile connections in February 2015
- @ValaAfshar: The average smartphone user checks their mobile device 214 times per day... and 86% of the time is apps (vs 14% browser). #codecon
- @BradStone: Meeker: 87 percent millennials say smartphones never leave their side night or day. 44 percent use camera at least once a day. #CodeCon
- @sequoia: "We're close to 1M people everyday staying at an @Airbnb home. We're here to stay" @bchesky #codecon
- @pmarca: Moore's Law used to be about faster, now it's more about cheaper. Huge change with the biggest possible consequences.
- Nicolas Liochon: CAP: if all you have is a timeout, everything looks like a partition
- @aweissman: "The basic unit of communication has become, not video or even voice calls, but text messages.Who saw that coming?"
- @huntchr: I'm once again reminded how unconventional it is for DBs to be clustered. This is one reason for the lack of resiliency in apps today.
- @codinghorror: 2 GB/sec is roughly within an order of magnitude of RAM speed (Around 22GB/sec
- @julianhyde: Watching @eric_brewer's talk on Kubernetes at #XLDB I realized service-oriented architecture finally became real, 10 years after the hype.
- Philip Zimmermann: A certain amount of elbow grease has to be expended when the police do their work. If it becomes too frictionless, you can slide more easily into a police state. I think we should restore a little bit of that friction.
- Cordell: And to see these threads back into the 19th century, for me this removes the idea of virality just from the platform of the Internet. You know, we often talk about the Internet enabling virality, but it seems clear to me that there's something much deeper about the kinds of things that we share and why we share them that extends prior to the technology we're working with today.

This would change things. What Memory Will Intel’s Purley Platform Use?: One slide, titled: “Purley: Biggest Platform Advancement Since Nehalem” includes this post’s graphic, which tells of a memory with: “Up to 4x the capacity & lower cost than DRAM, and 500x faster than NAND.” Also, What High-Bandwidth Memory Is and Why You Should Care

The question seldom asked with these kind of efforts: Does your idea of merit have merit? Startup Aims to Make Silicon Valley an Actual Meritocracy.

The reason for us to save everything is that our collective data is the training ground for future AIs. We should train them to understand all of humanity. Hopefully they'll learn pity. Oh, wait... The Internet With A Human Face: I've come to believe that a lot of what's wrong with the Internet has to do with memory. The Internet somehow contrives to remember too much and too little at the same time.

If you would like a rich exploration of the ethical implications of post-humanism then Apex: Nexus Arc Book 3 by Ramez Naam is the book for you. The framework is a game of iterated tit-for-tat. Ultimately if we don't want post-humans to destroy us lowly humans then we humans need to treat them well, from the start. If we harm them then the correct move on their part is to tat us. That won't be good. So open with a trust move and be nice. This radical notion might even work with normal humans.

Be honest, you wouldn't have taken Apple seriously either. The Inside Story of How the iPhone CrippledBlackBerry: There was a point where the carrier, by changing the rules, forced all the other carriers to change the rules eventually. It allowed Apple to reset what the expectations were. Conservation didn’t matter. Battery life didn’t matter. Cost didn’t matter. That’s their genius.’

Are we in a bubble? Guy who profits from bubbles says no. Unicorns: Why This Bubble Is Different: the future economic value of the unicorns is...the market for private placement of equity securities with institutional investors.

That Google Compute Engine, it may be pretty fast. Couchbase Server Hits One Million Writes Per Second with Just 50 Nodes of Google Compute Engine: Now, the results are in and we were able to sustain 1.1 million writes per second using only 50 n1-standard-16 VMs, each with a 500GB SSD Persistent Disk!

It's refreshing to see dumb persistence is still a viable strategy. Damage Recovery Algorithm Could Make All Robots Unstoppable: If you’re a jerk and step on one of the robot’s legs, snapping it off...instead of having to figure out which leg is broken and how, or doing any sort of self-analysis at all, the robot simply starts trying a whole bunch of different gait behaviors through “intelligent trial and error,” converging on something that works by exploring an enormous pregenerated set of potentially effective motions in about two minutes.

5 Steps To Re-create Xerox PARC's Design Magic: 1. Invent, don't innovate. Problem-finding beats problem-solving; 2. Think artist colony, not R&D lab; 3. Get the best people. Don't manage them; 4. Stay small, avoid hype, and pick a boring name; 5. Hire Alan Kay (or someone like him).

Papers from the XLDB-2015 Conference Program are available.

Alán Aspuru-Guzik: "Billions and Billions of Molecules". The problem is how do you create better materials? There are 10^82 atoms in the universe. The chemical space is 10^60-10^180 medium-sized molecules. How do you search the chemical space to find better materials? Why, you use Quantum mechanics and machine learning of course!

The bastards. Thoughts Can Fuel Some Deadly Brain Cancers. "This tumor is utilizing the core function of the brain, thinking, to promote its own growth," says Michelle Monje, a researcher and neurologist at Stanford.

Martin Kleppmann with an amazing talk on Using logs to build a solid data infrastructure. He clearly explains the benefits of log based architectures. It's all refreshingly straightforward. A question at the end of the talk revealed some of the difficulty. How do you implement read-your-writes consistency? The answer sounds a lot like a database that one would think you might be able to replace with logs. Good discussion on HN.

Mary Meeker with a 197 slide 2015 Internet Trends report. Internet growth slowing. Smartphone growth slowing. Data growth strong. Advertising and monetization high but slowing. Who am I kidding, it's nearly 200 dense slides.

Do Foreign Keys Matter for Insert Speed. Maybe not as much as you think. 2 microseconds per insert. So keep your integrity. Use FKs.

Confused by cluster schedulers? Then here's a good read: mesos, omega, borg: a survey. Excellent look at each in turn with lots of details. Also, see the Heracles paper at the end of this post.

Great discussion of the implications of data stored on disk not being available after a server "recovers." Dude, where’s my metadata? Definitely read the comments.

10 quick tips for Redis: STOP USING KEYS *; Find Out What’s Slowing Down Redis; Use Redis-Benchmark as a Baseline, Not the Gospel Truth; Hashes Are Your Best Friend; Set That TTL!; Choosing the Proper Eviction Policy; If Your Data is Important, Try/Except; Don’t Flood One Instance; More Cores = More Better, Right?!; HA All the Things!

Cask shows how they built a linearly and horizontally scalable queue using HBase. Scalable Distributed Transactional Queues on HBase. It is also strongly consistent with an exactly-once delivery guarantee. Result: Using a 1K bytes payload with batch size of 500 events, we achieved a throughput of 100K events per second.

How our solid-state drives result in cost savings for customers: The reason is that we don’t think in terms of dollars-per-gigabyte. We think in dollars-per-IOPS...The SSDs Fastly uses, however, will execute somewhere in the region of 75,000 IOPS reading and 11,500 when writing. When it comes to dollars-per-IOPS, SSDs beat the pants off regular drives, and allow Fastly to read an object off our disks in under a millisecond (actually, 95% of the time that number is under 500 *micro*seconds).

Isn’t this how most things break? Ali-Liston 50th anniversary: The true story behind Neil Leifer’s perfect photo: After a romantic breakup, for example, one often wonders about the specific moments of the split, the night it happened—what could one have done differently to keep the fracturing at bay, that night, that morning, that afternoon on the phone? But later, of course, one sees it wasn’t about those moments at all—that the breakup was just the logical outcome of fissures that had already happened and happened slowly, undetected, much earlier, stacking one upon the other.

KeystoneML: a software framework designed to simplify the construction of large scale, end-to-end, machine learning pipelines in Apache Spark.

Coast: a high-level streaming toolkit written in Scala. coast is designed around Kafka’s partitioned log model, and supports complex streaming topologies with unusually strong messaging guarantees and no need for a central coordinator.

Braess-like Paradoxes in Distributed Computer Systems: adding capacity to the system may degrade the performance of all users. Unlike the original Braess paradox, we show that this behavior occurs only in the case of finitely many users and not in the case of infinite number of users.

Ceph and SSD: The new storage platform is made with Solid State Drives and Ceph: a fault-tolerant open-source distributed peta-scale storage stack.

Heracles: Improving Resource Efficiency at Scale: a feedback-based controller that enables the safe colocation of best-effort tasks alongside a latency critical service. Heracles dynamically manages multiple hardware and software isolation mechanisms, such as CPU, memory, and network isolation, to ensure that the latency-sensitive job meets latency targets while maximizing the resources given to best-effort tasks. We evaluate Heracles using production latency critical and batch workloads from Google and demonstrate average server utilizations of 90% without latency violations across all the load and colocation scenarios that we evaluated.

Stuff The Internet Says On Scalability For May 29th, 2015

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale