hot links

Stuff The Internet Says On Scalability For November 27th, 2015

High Scalability

27 Nov 2015 — 9 min read

Hey, it's HighScalability time:

The most detailed picture of the Internet ever as compiled by an illegal 420,000-node botnet.

$40 billion: P2P lending in China; 20%: amount of all US margin expansion accounted for by Apple since 2010; 11: years of Saturn photos; 117: number of different steering wheels offered for a VW Golf; 1Gbps: speed of a network using a lightbulb.

Quotable Quotes:
- @jaksprats: If we could compile a subset of JavaScript to Lua, JS could run on Server(Node,js), Browser, Desktop, iOS, & Android.JS could run EVERYWHERE
- @wilkieii: Tech: "Don't roll your own crypto if you aren't an expert" *replaces nutrition with Soylent, currency with bitcoin* *puts wifi in lightbulb*
- @brianpeddle: The architecture of one human brain would require a zettabyte of capacity. Full simulation of a human brain by 2023.
- MarshalBanana: That can still easily be the right choice. Complex algorithms trade asymptotic performance for setup cost and maintenance cost. Sometimes the tradeoff isn't worth it.
- kevindeasi: There are so many things to know nowadays. Backend: Sql, NoSql, NewSql, etc. Middlware: Django, NodeJs, Spring, Groovy, RoR, Symfony, etc. Client: Angular, Ember, React, Jquery, etc. I haven't even mentioned hardware, security, servers/cloud, and api. Now you also need to know about theory, UI/UX, git, deploying servers, HTTP, scrum, software development process, testing.
- Brian Chesky~ It was better to have 100 people who loved us vs. 1M people who liked us. All movements grow this way.
- idlewords: All the advantages of a dedicated server without the hassle of saving tons of money.
- jorangreef: Well, how would you handle massive traffic spikes? Through a combination of vertical and horizontal scaling? Through having excess capacity? Except that I would probably want to start with something fast and inexpensive to begin with.
- @jaykreps: "The bigger the interface, the weaker the abstraction"--@rob_pike
- Animats: That still irks me. The real problem is not tinygram prevention. It's ACK delays, and that stupid fixed timer. They both went into TCP around the same time, but independently. I did tinygram prevention (the Nagle algorithm) and Berkeley did delayed ACKs, both in the early 1980s. The combination of the two is awful.
- @jaykreps: Distributed computing is the new normal: Mesos, K8s = dist'd processes; Cassandra, Kafka, etc = dist'd data; microservices = dist'd apps.
- @bradfitz: OH: "Well you can add nodes to the cluster. They made that work well, but you can't remove them. It's the Hotel California of auto-scaling."

Creating Your Own EC2 Spot Market -- Part 2. Video encoding represents 70% of Netflix's computing needs. And Netflix has a daily peak of 12,000 unused instances. So they created their own spot market to improve encoding throughput by the equivalent of a 210% increase in encoding capacity. Using their update real-time approach they were able to perform an encoding job in 18 hours that they expected to take a few days. Great article with a lot of deep thinking on the topic.

Amen! We should come up with a catchy name for RAII so more languages support it because RAII is awesome and simplifies code!

Google as a cloud company instead of an ad company? It could happen: Google's Holzle Envisions Cloud Business Eclipsing Ads in 2020. Google announced Custom Machine Types so you can configure the number of virtual CPUs and the amount RAM you want for you machine. I imagine this nifty feature is enabled by Google's advanced datacenter scheduling software, but it will take more than that to beat AWS and Azure. To take market share Google may need to instigate a price war. Though it looks like Google might make a lot of money charging back to Google.

Good explanation of what is servless computing by Leonardo Federico: the phrase “serverless” doesn’t mean servers are no longer involved. It simply means that developers no longer have to think "that much" about them. Computing resources get used as services without having to manage around physical capacities or limits. Let's take for example AWS Lambda. "Lambda allows you to NOT think about servers. Which means you no longer have to deal with over/under capacity, deployments, scaling and fault tolerance, OS or language updates, metrics, and logging."

Seems sensible. Lessons Learned from the Patreon Security Breach: Keep your development environment isolated from your production environment; Use dummy data, not production data, in your development environment.

Square improved ruby-protobuf deserialization performance by 50% with a few hours of effort. Excellent example of speeding up code by profiling, finding bottlenecks, and coming up with simple targeted improvements. Profiling using RubyProf showed slow enum handling which was fixed with caching. An often used function was too slow so a C version was used. Some other often called functions were short circuited with a hash lookup. Plan is to move to protobuf3 which is coded in C++ and is 7x faster.

It Was Never Going to Work, So Let’s Have Some Tea. James Mickens is funny with way too much truth.

This is how to store human knowledge for eternity. A highly polished look at using DNA for storage. The idea is information is only useful if it is stable. Fossils are how you make DNA stable. So encapsulating DNA in glass is the best way to store information along with using error correction codes to encode DNA. They tested with a 1 million years of cold storage. All of Facebook and Wikipedia could fit in a small test tube.

What's Worked in Computer Science: Not only is every Yes from 1999 still Yes today, seven of the Maybes and Nos were upgraded, and only one was downgraded. And on top of that, there are a lot of topics like neural networks that weren’t even worth adding to the list as a No that are an unambiguous Yes today. Good discussion on reddit.

Pressure changes everything. Modern-day alchemy is putting the periodic table under pressure: Here, the elements’ familiar identities start to blur. “We essentially have a new periodic table at high pressures"

The 10x advantages of the D language: 10x faster to compile than C++ for comparable produced code; 10x faster than scripting languages for comparable convenience; 10x easier to interface with C and C++ than any other language; 10x better than any other system language at generic and generative programming.

It didn't go well for The Guardian when they tried using Open Stack for their private cloud. The Guardian goes all-in on AWS public cloud after OpenStack 'disaster’. The start sounded promising: built on Cisco’s UCS servers and NetApp storage, running the Ubuntu operating system and OpenStack management and orchestration software. But after huge amounts of effort the decided to move to AWS: We didn't manage to deliver self-service, we didn't manage to deliver decent load-balancing or autoscaling and actually all the benefits we get from AWS we simply did not get inside of the cloud that we were building internally.

How do you figure out what emoji's actually mean? Here's an interesting story about doing just that: Crying with laughter: how we learned how to speak emoji.

servercobra: If you want to run your own infrastructure from the ground up, maybe. Ironic (bare metal deploys) + Magnum (container deploys with Kubernetes) is exciting in that area, but as someone who wrote and ran an OpenStack cloud, I wouldn't recommend it. For most people, something like ECS/GKE/bare Kubernetes on a cloud provider is a decent step in that direction.

Adding no additional instances BloomReach was able to increase capacity by 4x and improve average latencies by 50%. Here's how in this superb analysis and explanation: Increasing Cassandra Capacity for the Holidays Without Adding Nodes. Analysis showed CPU was impacted by three main factors: Replication load from the back end; Time spent doing disk IO operations; Read request workload from our API’s. A smarter client was created to group keys being requested for a query into the appropriate token range and then send the request directly to a node which owns that range. More data was served from RAM.

Controlling Robots on Mars. Imagine if it took 14 minutes to control your system and get a response. That what it was like controlling robots on Mars. Bobak Ferdowsi, systems engineer at NASA, says it all has to be automated. The suite of commands for the rover to execute are sent in the morning, about 9AM or so. The rover executes the commands. The results are seen around 3PM when the orbiter passes over head. The rover has to be driven without any clues about how it's doing. It can drive on its own given a goal by navigating using stereographic images.

encoderer: This is pretty great stuff. At a more human scale, our saas monitoring tool handles peaks of ~100req/sec that are written to SQS. The daemon that evaluates rules and triggers alerts has a queue health check integrated. It will pause itself when the queue backs up and if the issue persists it sends a page. Features like are just a few lines of code and have helped us squash false positive alerts.

Regardless of what you think of the G+ redesign, it is a lot faster. What changed? Mike Elgan explains: Google completely replaced the underlying code, embracing a "responsive design" approach that enables one implementation across all platforms...In the past, a Google+ stream would load everything at once, so there was a gigantic lag. Now, Google claims not only that pages are a tiny fraction of the size they were before, but also that they never download more than 60K of HTML, 60K of JavaScript and 60K of CSS at once. The most noticeable effect is that animated GIFs first appear as still photos, then come to life only later...On the old Google+, everything was a search result. For example, viewing your "Family" circle was just a search result saying: "Show me all posts by people on this list in reverse chronological order." Actual searches (as in using the search box) were indistinguishable in concept from any other view. Everything you did in Google+ was a stream resulting from either a manual or an automated search.

Is killing the web the only way to save it? Google Bring you up to speed on AMP.

Then again there is a strong relationship with low load times and success. GQ cut its webpage load time by 80 percent and traffic and revenue spiked.

A New Age In Cluster Interconnects Dawns: Having said all of that, there is one outside force that may come into play in the HPC space: relatively inexpensive 25 Gb/sec server connectivity on the server and 50 Gb/sec and 100 Gb/sec Ethernet switching to link to it.

Facebook Reveals The Secrets Behind “M,” Its Artificial Intelligence Bot: After someone sends M a message via Messenger, “a response is actually formulated by the AI engine, and the trainers have the ability to let that response go, or one of those responses that are suggested go, or write a completely different one, or do something else,” Marcus said. The AI, intriguingly, always takes a stab at the answer first.

Well said. Good Leaders are game changers: Raft & Paxos: Distributed systems can be characterized by a set of safety and liveness properties or the mix of the two. Informally, safety is a property that stipulates that nothing bad happens during execution of a program. On the other hand, liveness stipulates that something good eventually happens.

Videos from GraphConnect San Francisco 2015 are now available.

Big RAM is eating big data: For example, the fact that many datasets (already refined for modeling) now fit in the RAM of a single high-end server and one can train machine learning models on them without distributed computing has been noted by many top large scale machine learning experts.

Key Lessons Learned from Transition to NoSQL at an Online Gambling Website. Clear explanation of the power of the actor model coupled with CRDT's superior consistency.

I hope caffeine does not the same effect in the office environment! Plants spike nectar with caffeine and give bees a buzz: But according to her team’s latest research, the caffeine actually leads to behavioural changes that serve the plant’s needs while making the bee colony less productive. “What I think it does is make them exploited pollinators,” she says. “The plants are tricking them into foraging in ways that benefit the plant, not the bee.”

A great deep dive on The story of one latency spike. System Tap is used look at the internals of operating system packet processing, as well as a bevy of other tools. You won't believe what they uncovered.

Open Fog Consortium is a new vendor consortium hoping to capture a lot of IoT dollars. The future of of some mix of local processing coordinated with centralized processing is still up for grabs.

Videos from Nixcon2015 are available.

Monitoring Erlang/OTP Applications using Multiparty Session Types: Can multiparty session types be used to encode communication patterns in distributed Erlang/OTP applications, and what benefits are there if they can?

IonDB: a key-value datastore for resource constrained systems.

I know this works because we used the ideas in an embedded system to handle many thousands of high precision timers. Hashed and Hierarchical Timing Wheels: Data Structures for the Efficient Implementation of a Timer Facility.

DaemonSet in Kubernetes: Many users have requested for a way to run a daemon on every node in a Kubernetes cluster, or on a certain set of nodes in a cluster. This is essential for use cases such as building a sharded datastore, or running a logger on every node. In comes the DaemonSet, a way to conveniently create and manage daemon-like workloads in Kubernetes.

pyunicorn: Unified Complex Network and Recurrence Analysis Toolbox

SF-TAP: Scalable and Flexible Traffic Analysis Platform Running on Commodity Hardware: we propose a scalable and flexible traffic analysis platform (SF-TAP) that provides an efficient and flexible application-level stream analysis of high-bandwidth network traffic. Our platform’s flexibility and modularity allow developers to easily implement multicore scalable application-level stream analyzers.

Google's loss is our gain. Greg Linden's Quick links are back.

Stuff The Internet Says On Scalability For November 27th, 2015

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale