hot links

Stuff The Internet Says On Scalability For November 20th, 2015

High Scalability

20 Nov 2015 — 11 min read

Hey, it's HighScalability time:

100 years ago people saw this as our future. We will be so laughably wrong about the future.

$24 billion: amount telcos make selling data about you; $500,000: cost of iOS zero day exploit; 50%: a year's growth of internet users in India; 72: number of cores in Intel's new chip; 30,000: Docker containers started on 1,000 nodes; 1962: when the first Cathode Ray Tube entered interplanetary space; 2x: cognitive improvement with better indoor air quality; 1 million: Kubernetes request per second;

Quotable Quotes:
- Zuckerberg: One of our goals for the next five to 10 years is to basically get better than human level at all of the primary human senses: vision, hearing, language, general cognition.
- Sawyer Hollenshead: I decided to do what any sane programmer would do: Devise an overly complex solution on AWS for a seemingly simple problem.
- Marvin Minsky: Big companies and bad ideas don't mix very well.
- @mathiasverraes: Events != hooks. Hooks allow you to reach into a procedure, change its state. Events communicate state change. Hooks couple, events decouple
- @neil_conway: Lamport, trolling distributed systems engineers since 1998.
- @timoreilly: “Silicon Valley is the QA department for the rest of the world. It’s where you test out new business models.” @jamescham #NextEconomy
- Henry Miller: It is my belief that the immature artist seldom thrives in idyllic surroundings. What he seems to need, though I am the last to advocate it, is more first-hand experience of life—more bitter experience, in other words. In short, more struggle, more privation, more anguish, more disillusionment.
- @mollysf: "We save north of 30% when we move apps to cloud. Not in infrastructure; in operating model." @cdrum #structureconf
- Alex Rampell: This is the flaw with looking at Square and Stripe and calling them commodity players. They have the distribution. They have the engineering talent. They can build their own TiVo. It doesn’t mean they will, but their success hinges on their own product and engineering prowess, not on an improbable deal with an oligopoly or utility.
- @csoghoian: The Michigan Supreme Court, 1922: Cars are tools for robbery, rape, murder, enabling silent approach + swift escape.
- @tomk_: Developers are kingmakers, driving technology adoption. They choose MongoDB for cost, agility, dev productivity. @dittycheria #structureconf
- Andrea “Andy” Cunningham: You have to always foster an environment where people can stand up against the orthodoxy, otherwise you will never create anything new.
- @joeweinman: Jay Parikh at #structureconf on moving Instagram to Facebook: only needed 1 FB server for every 3 AWS servers
- amirmc: The other unikernel projects (i.e. MirageOS and HaLVM), take a clean-slate approach which means application code also has to be in the same language (OCaml and Haskell, respectively). However, there's also ongoing work to make pieces of the different implementations play nicely together too (but it's early days).

After a tragedy you can always expect the immediate fear inspired reframing of agendas. Snowden responsible for Paris...really?

High finance in low places. The Hidden Wealth of Nations: In 2003, less than a year before its initial public offering in August 2004, Google US transferred its search and advertisement technologies to “Google Holdings,” a subsidiary incorporated in Ireland, but which for Irish tax purposes is a resident of Bermuda.

The entertaining True Tales of Engineering Scaling. Started with Rails and Postgres. Traffic jumped. High memory workers on Heroku broke the bank. Can't afford the time to move to AWS. Lots of connection issues. More traffic. More problems. More solutions. An interesting story with many twists. The lesson: Building and, more importantly, shipping software is about the constant trade off of forward movement and present stability.

5 Tips to Increase Node.js Application Performance: Implement a Reverse Proxy Server; Cache Static Files; Implement a Node.js Load Balancer; Proxy WebSocket Connections; Implement SSL/TLS and HTTP/2.

Docker adoption is not that easy, Uber took months to get up and running with Docker. How Docker Turbocharged Uber’s Deployments: Everything just changes a bit, we need to think about stuff differently...You really need to rethink all of the parts of your infrastructure...Uber recognizes that Docker removed team dependencies, offering more freedom because members were no longer tied to specific frameworks or specific versions. Framework and service pawners are now able to experiment with new technologies and to manage their own environments.

Material science is so cool. Diamond Nanothread is the new Carbon Nanotube: They stacked the molecules into a line, placed it under pressure so that the molecules polymerized and, voila, created a diamond nanothread...ideal for the creation of extremely strong three-dimensional nano-architectures.

But if you are still interested in carbon nanotubes here's an impressive talk--Carbon Nanotubes for Digital Logic--by George Tulevski, a materials science engineer at IBM Research. Clear, detailed, thoughtful. He says making digital technology is the hardest thing you can do as a technology because it has the highest bar for performance. We may get there we may not. The future is bright for the material. It has the right properties and we are just developing the tools necessary to exploit those properties. There's an emphasis that supporting tools are necessary for success.

Is Plankalkül the world’s first programming language? First Steps: Lectures from the Dawn of Computing talks about Konrad Zuse: His own machines were based on a complex algebra he had developed based on propositional calculus and were programmed using probably the world’s first programming language, Plankalkül.

In Run containers on bare metal already! Bryan Cantrill is very entertaining and very convincing about Joyent's new Docker technologies Triton and Manta. The problem with hardware virtualization is that the abstraction is so low that the machine is still grossly underutilized. The hardware virtualization layer has no idea what the VMs are doing so it can't optimize properly. Docker views containers not from the lens of the operator, it views containers through the lens of the developer. There is so much good software out there we are being crippled by choice. And those choices have serious operational implications. Docker allows the developer to pull in whatever they need. A developer can pull the components on to their laptop and then have the same image run in production. If you deploy a container in VM you are betraying the economic and performance advantage of containers. Containers are secure, it's just Linux containers that have a problem. Triton makes a whole datacenter look like a single docker host. There are no VMs. All containers run directly on the metal. There is a performance benefit, but the real win is avoiding the added complexity of VMs. Just allocate containers on real hardware so we don't have to think about allocating things, but consuming things. It won't be a winner take all market. Manta spins up a container on your objects.

A good example of connecting services in a AWS pipeline. Building a Near Real-Time Discovery Platform with AWS. Twitter -> Amazon Kinesis Firehose -> Amazon S3 -> Python Lambda function -> Amazon Elasticsearch Service -> Kibana.

A good gloss by Apigee on a Chris Munns talk on how enterprise microservices are really built at Amazon, and what makes them work at enterprise scale..

P85D - Electric Mechanical Braking System: It was very interesting how Elon emphasized at the event how they really had to work to wring latency out of all the systems involved in autopilot-- sensors, computer processing and algorithms, end effectors (brakes, steering). Of course this makes perfect sense, but it is interesting that they identified this so early in the development, that you cannot make a fast, stable and safe control system without squeezing every microsecond of delay out of the control loops.

Private clouds are not dead. At least if you are eBay. Here's an excellent peak Inside eBay’s Shift To Kubernetes And Containers Atop OpenStack. eBay now has more than 500,000 cores spread across more than 150,000 servers with over 200 PB of storage. eBay processes billion of queries and serves out more than 20 billion images per day. How will they use it?: The adoption of Kubernetes at eBay is not just about moving to containers to deploy applications, but changing the application lifecycle at the company, which is centered around the infrastructure cloud layer (with provisioning, deploying, monitoring, and remediating issues being the key functions for developers and system administrators to perform). eBay plans to go to a more flexible deployment model using containers as its runtime and Kubernetes on top of OpenStack to manage those containers...one of the reasons why technologies such as Kubernetes and Mesosphere are getting a lot of interest is that they impose a lot less overhead on servers when multiple workloads are shared on a single machine and, perhaps more importantly, containers with scale out workloads can be fired up and retired at a much faster pace than traditional server virtualization...Kubernetes is available as a controller layer on several public clouds running Docker containers, which would give eBay the capability to burst out to a public cloud should the need arise.

If you need a break then here are some spooky looking BioRobots for your bemusement. The design of the lamprey is fascinating. Such complex and life-like motion from such a simple system. Motion is controlled by the spinal cord, not the brain. Very modular design. Every segment has it's own processor, battery, a CAN bus connects them together. It must be at least one meter long to propagate an traveling wave. The system is very distributed. Cut it in half and the spinal cord halves will still work. The upper part of the brain just sends down two signals to the spinal cord. Increase both signals and it goes faster, stimulate one side and it turns to that side. The brain doesn't control the muscles, the spinal cord does that. From a mathematical perspective it's a network of coupled oscillators. The next generation has a GPS and chemical detection so it can follow gradients. The appearance is not so beautiful but the motion is beautiful.

The SEC now allows everyone to buy equity in crowdfunded ventures. This is a major shift in financing that probably won't impact VCs that much, but Angels might have their wings clipped.

Awesome article in ACM Queue about how Facebook handle's Fail at Scale. The Morning Paper glosses the article with a focus on controlling queue delay: The key thing to understand is that not all queues are bad – so we can’t simply react to queue size. Good queues convert bursty arrivals into smooth, steady departures (and reduce in length when the arrival rate drops back down below the departure rate). Bad queues (standing queues) serve no useful purpose and simply create excess delay.

A thoughtful analysis. The Cost of Frameworks: For me the results are pretty clear: there appears to be a pretty hefty tax to using Frameworks on mobile, especially compared to writing vanilla JavaScript. The fastest is React (under production conditions), which is awesome, but it’s still 3x slower than Vanilla

To bastardize Emerson, the office of the programmer is to create appearances amongst facts. So instead of Programmers, Let’s Earn the Right to Be Called Engineers, I'd like to see Engineers earn the right to be called Programmers. Let's see you be as flexible, as creative, and as productive as programmers. Engineer envy is completely unnecessary.

Interesting mechanical-sympathy thread on how the use of Docker effects latency. Seems like Docker's latency impact on CPU and memory are minimal, but for networking it's substantial and negative.

Nerdgasm: Karl Brumund – Building a Small DC… For the rest of us: Whitebox switches work; Automation is your friend; BGP Communities can be used for automation and configuration; Putting security into the application instead of firewalls saved much money and time; You can use Anycast IP in OSPF in a small data centre.

It's sad to see FastMail Shutting down our XMPP chat service. Remember the bright shiny future of open federated messaging over XMPP? Like Camelot it even existed for a brief shining moment. Not that XMPP wasn't a huge pain, it was, but more likely XMPP fell out favor because all the Kings prefer their land behind some very high walls. And we know the King and the land are one.

Does Black Friday have your SQL database down? Brent Ozar has a few things to look for: Are data or log files growing? Are queries being blocked? Are queries being rolled back? Did a bad plan get into cache? Are shared resources under pressure?

Joe Duffy, engineering director for the Compiler and Language Platform group at Microsoft, is writing an insightful series of articles on Midori: a research/incubation project to explore ways of innovating throughout Microsoft's software stack. This spanned all aspects, including the programming language, compilers, OS, its services, applications, and the overall programming models. We had a heavy bias towards cloud, concurrency, and safety. The project included novel "cultural" approaches too, being 100% developers and very code-focused, looking more like the Microsoft of today and hopefully tomorrow, than it did the Microsoft of 8 years ago when the project began.

StorageMojo on the Dell EMC merger: The new normal is already here – Amazon Web Services – with Google and Microsoft in the mix. Just as large, vertically integrated vendors disappeared in the 80s and 90s, we’re looking at the extinction of today’s large IT vendors, at least in their present forms...It won’t end well.

Videos are available from Facebook's Security @Scale 2015. Topics include Engineering Security at Facebook, Making Security Usable at HubSpot, Safety at Scale, Elliptic Curve Cryptography, Building Open Source Software for Security, Rapid Identification and Classification of Mobile Malware, and more.

Every new version of iOS creates a pile of technical debt as new features must be implemented. Here's Flickr’s experience with iOS 9 and how they use Spotlight Search, Universal Links, Deep Linking, and 3D Touch. Lots of good examples and advice. The unbundling of apps into the OS continues.

Netflix has open sourced Spinnaker, their Continuous Delivery platform for releasing software. The surprise is it's not just for AWS, but it also works on the Google cloud. Wired has a good article with more details. "Google spent a year working with Netflix to ensure this was the case." Plans are to also support Azure. Is Netflix creating a sail should it ever need to escape to a new cloud? Or will Netlfix be the first large property to truly span multiple clouds? Netflix continues to lead the way.

Here's a free book on Understanding Machine Learning: From Theory to Algorithms. And The Elements of Statistical Learning. And Foundations of Data Science. And A Course in Machine Learning. And Evaluation of Deep Learning Toolkits.

spotify / Heroic: A scalable time series database based on Cassandra and Elasticsearch. Spotify uses Heroic in their monitoring infrastructure: Heroic is our in-house time series database. We built it to address the challenges we were facing with near real-time data collection and presentation at scale. At the core are two key pieces of technology are Cassandra, and Elasticsearch. Cassandra acts as the primary means of storage with Elasticsearch being used to index all data. We currently operate over 200 Cassandra nodes in several clusters across the world serving over 50 million distinct time series.

haskell-tor: A Haskell implementation of the Tor protocol.

Tor: Hidden Service Scaling: The thesis will try to provide a deeper understanding of Tor hidden services, and look at some possible architectural changes and modifications which will address the issue of scalability while preserving the security properties of are threat model. As opposed to the original threat model stated in Tor’s design paper[2], The threat model this thesis considers is that under deploying a Tor hidden service for a large scale web service.

Pyro: A Spatial-Temporal Big-Data Storage System: a spatial-temporal bigdata storage system tailored for high resolution geometry queries and dynamic hotspots. Pyro understands geometries internally, which allows range scans of a geometry query to be aggregately optimized.

MULTIAGENT SYSTEMSAlgorithmic, Game-Theoretic,and Logical Foundations: The goal of this book is to bring under one roof a variety of ideas and techniques that provide foundations for modeling, reasoning about, and building multiagent systems.

PaaSTA: a highly-available, distributed system for building, deploying, and running services using containers and Apache Mesos! PaaSTA has been running in production at Yelp for more than a year.

pinlater: a Thrift service to manage scheduling and execution of asynchronous jobs. Pinterest uses PinLater to power actions like Pinning, following and image thumbnail retrieval, as well as large batch operations like email delivery and push notifications. Key features: reliable job execution, job scheduling, rate limiting, language agnostic, horizontal scalability, multiple storage backends, observability.

Apache Geode: an open source, distributed, in-memory database for scale-out applications. From Pivotal: China National Railways use Geode to run railway ticketing for the entire country with a 10 node cluster, managing 2 TB of "hot data" in memory, and 10 backup nodes for high availability and elastic scale. Holiday travel periods [Chinese New Year's] create peaks of 15,000 tickets sold per minute, 1.4 billion page views per day and 40,000 visits per second.

Stuff The Internet Says On Scalability For November 20th, 2015

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale