hot links

Stuff The Internet Says On Scalability For December 21, 2012

High Scalability

21 Dec 2012 — 5 min read

We at HighScalability are betting the over on the whole Mayan end of the world thingy:

200M: monthly active Twitterers; 120: number of Netflix reencodings; 1.2 Million Years: Pr0n Watched Since 2006; 100M: Google Core-Hours Awarded to Science

Quotable Quotes:
- @shipilev: I've settled on saying that if performance is the scalar field in state space, then scalability is just it's gradient.
- @AndiMann: "Only 1% of #Amazon users should care about #cloud scalability, elasticity". Brilliant!
- @Guerrero_FJ: Always remember: 'scalability problems should be solved when there are scalability problems.' #leanstartup

Santa's Architecture: It's a little known fact that Santa Clause was an early queue innovator. Faced with the problem of delivering a planet full of presents in one night, Santa, in his hacker's workshop, created a Present Distribution System using thousands of region based priority present queues for continuous delivery by the Rudolphs. Rudolphs? You didn't think there was only one Rudolph did you? Presents are delivered in parallel by a cluster of sleighs, each with redundant reindeer in a master-master configuration. Each Rudolph is a cluster leader and they coordinate work using an early and more magical version of the ZooKeeper protocol.

Werner Vogels with a list of all his Back-to-Basics Readings of 2012. Hm, no Jane Austen...

For a look at an evolved website<-->BigData stack, here's the awesome UC Berkeley Course Lectures on Analyzing Big Data With Twitter. Lots of juicy topics: Twitter Philosophy and Software Architecture; Introduction to Hadoop; Introduction to Apache Pig; Coding to the Twitter API; Detecting Twitter Trends; Real-Time Twitter Search; Splunk’s Software Architecture and GUI for Analyzing Twitter Data; Twitter’s Social Network; Big Learning with Graphs; Twitter Recommendations; Information Diffusion on Twitter; Introduction to Scalding; Spark: Making Big Data Analytics Interactive and Real-Time.

A reason programmers don't like queues too?: The motorist is born free, but everywhere he is in queues.

Don't just complain about IO on Amazon, do something about it! With some effort you can craft a high IO hybrid architecture by directly connecting AWS with a 1 gig or 10 gig pipe. Think one millisecond between your DC and and an AZ. It's a mysterious process involving all sorts of networking voodoo. Taking a lot of mystery out of the process is Benjamin Krueger's truly wonderful article: AWS Direct Connect on the AWS ADVENT 2012 blog (it's packed with useful info and is well worth a look). It's not as expensive as you might think, but there is a lot to it. Some examples of where it might be useful: low latency and high throughput applications; when you need to run your own appliances; keep data in your secure environment; cope with DDoS using fast instance provisioning for the web tier will keeping your data tier isolated and secure; sending master writes to your DC and keeping read slaves in AWS; hosting companies can have fast IO for their customers to AWS.

Get a good drink and pull on up to the nerd experience bar in this monumental Reddit thread: "Whose bug is this anyway?!?" - A few memorable bugs Patrick Wyatt encountered while working on StarCraft and Guild Wars. Lots of good stories shared.

It's forced early retirement for some servers on Amazon. AWS stops some EC2 servers without warning with an excellent overview of one reason for server death. Of course, good Cloud Scouts should always be prepared, but this peek into the internals of the process is quite interesting. Virtual hardware lasts about 200 days at which time it may be sent off to the farm, along with other old faithful and loyal doggies. Just something to keep in mind.

Pulse's Tech Talk. They: focus on core features, minimize admin; develop on GAE first, move to AWS if needed; best practices for running on app on multiple screen sizes.

It's almost Festivus so Greg Ferro is getting to the airing of grievances part of the celebration with Tired of Being Blamed, It’s Not the Network, It’s the Operating System. If you want dynamic mobility in a virtualized environment the change the underlying protocols, don't change every device on the planet. Some excellent thoughts in the comments section too.

Matthew Aslett, with the labor of Hercules, has built an amazing Database Landscape Map. From the map it's clear, combining a wide distribution of need with the skills to fullfill those needs is a perfect way of exhaustively exploring a landscape.

You have a lot of data streaming in, what do you do with it? Nice look at one framework in Streaming Data into Apache HBase using Apache Flume: Flume is an excellent tool to write events out to the different storage systems in the Hadoop ecosystem including HBase. The HBase sinks provide the functionality to write data to HBase in your own schema and allows the user to “map” the Flume event to HBase data.

Why the “stupid network” isn’t our destiny after all. When we forget the transformation that can occur when we make things programmable we also miss the possible enchantment of the world. The dumb static network seemed like the right idea, even a great idea, but that's before we went beyond autonomy and into programmability.

Facebook is moving to native code to create a faster experience: Reduce Garbage Collection, Write a Custom Event Bus; Move Photos to the Native Heap, Write a Custom ListView Recycler. Also, The JavaScript SDK – Truly Asynchronous Loading.

Even though 12 Universal Lessons We’ve Learned recommends not wasting time on reading articles like 12 Universal Lessons We've Learned, it's still a worthwhile lessons learned type story: start single player and then move to social, intense shorter days are better than long days (I agree), connect to life, use the beginner's mind grasshopper, be happy, think before doing, the beginner is subject to distractions, focus on your balls, share it stupid.

An excellent, balanced, and informed look: AWS: the good, the bad and the ugly: As we grow from over 100 to over 1000 boxes it’s going to be necessary to diversify to those other providers.

PipelineDeals on What it means to be truly geographically redundant on AWS: stay in the east but be ready to jump to another AZ; use chef to bring up any server in any AZ; don't us EBS backed instances; use a skelaton crew of servers in another AZ to avoid the rush on failure; practice practice practice; combine server roles.

How MaxCDN Achieved a .50s Page Load: Implement all the latest technologies; Minify to the extreme; Code with performance in mind; Lookup domain names early; Don’t get frustrated; CSS3 is the bomb; CSS Hat is your friend; Use a CSS reset; Use responsive design wisely; Solid colors are not always bad; and many more.

Looking for an ops model? Try Evernote and the description of their configuration and software deployment processes: The combination of a customized post-installer and the use of Puppet and Fabric has enabled us to manage our systems effectively.

Yelp on the crucial yet less romantic part of the production process: Building and Testing Yelp Mobile. They use: YelpKit for development, GHUnit for testing, TestFlight to distribute apps to testers, and Crashlytics to debug crashes.

Understanding Is A Poor Substitute For Convexity (antifragility) by Nassim Nicholas Taleb: The point we will be making here is that logically, neither trial and error nor "chance" and serendipity can be behind the gains in technology and empirical science attributed to them. By definition chance cannot lead to long term gains (it would no longer be chance); trial and error cannot be unconditionally effective: errors cause planes to crash, buildings to collapse, and knowledge to regress.

Practical Self-Stabilization for Tolerating Unanticipated Faultsin Networked Systems: It is our position that the property of stabilization is desirable for distributed, networked systems to deal with unanticipated faults. In this article, we provide a gentle introduction to the concept of stabilization, respond to various criticisms and misconceptions about its use, and suggest practical approaches for its design.

Heads up performance people...Martin Thompson has started a new Mechanical Sympathy group for making "hardware and software work together in harmony to achieve great performance."

Stuff The Internet Says On Scalability For December 21, 2012

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale