hot links

Stuff The Internet Says On Scalability For May 11, 2012

High Scalability

11 May 2012 — 3 min read

It's HighScalability Time:

2.5M : Erlang Concurrent Connections; 20 Billion : Urban Airship Push Notifications.
@agentdero: "You go to production with the code you have, not the code you wish you had" - Devops Rumsfeld
@PatrickMcFadin: After talking to a lot of big #aws customers tonight, the big non-secret is we'll be seeing #ssd instances soon.
Goodbye, CouchDB. Steven Hazel shares his experience report with CouchDB. Like many relationships it all started great, but reliability, performance, and maintenance problems drove him into the arms of Percona MySQL. They use MySQL in NoSQL mode and in return they get better performance and a love that never fails.
MongoDB at Craigslist: One Year Later. Jeremy Zawodny and Chris Mooney of Craigslist recount their experiences using MongoDB for the Archive project that now supports 2.5 billion posts and 6 TB of data. Biggest problem has been data migration. Had to pre-split the data to bulk load. Ran into NUMA issues on large memory systems. Issues on small disk systems. Replica sets are great. Things fail all the time and it works. Sharding is great. Can keep growing, add hard drives, take replica sets down. Like Perl driver. Works really well. Cross colo replication problems, but nothing that can't be fixed.
Some queuing theory: throughput, latency and bandwidth. More cowbells is not the answer. Nice explanation of the negative impact of queuing on latency and the proposed Controlled Delay algorithm for TCP. Also, Fundamental Progress Solving Bufferbloat. Also also, TCP doesn't suck, and all the proposed bufferbloat fixes are identical.
Reserved Instances and Purchasing Strategy. #firstcloudproblems. Michael Wasser on how to select your mix of boxes? Spot instances lower costs without the upfront fees at the expense of more average downtime. Reserved instances have a big upfront cost, but you can amortize the purchases over the entire length of the reserved instance term by dividing the purchases evenly over a three-year term.
Distributed Systems Design (Part 4/4). Blue Box with a good series of articles on understanding the basics of distributed systems design.
Decomposing Applications for Deployabilty and Scalability. Good presentation by Chris Richardson using the idea of scale cube: Y axis is functional decompoosition, X axis horizontal duplications, and Z the axis is partitioning. Covers a lot of ground and is notable for its coverage of Cloud Foundary.
AWS is NOT the prime enabler of scalability. Gil Hildebrand makes the point that AWS does not solve all scalability problems and it's not the cheapest option. For web scalability look to: Cheap RAM, Non-relational Databases, Content Delivery Networks, and Better Libraries and Documentation.
How LinkedIn uses WebSockets: On average, WebSocket is faster, but practically negligibly so. However, it is far more consistent than either of the URL scheme implementations which had widely varied timings. That, coupled with the asynchronous behavior, make WebSockets a win for many solutions. Also, Data Infrastructure @ LinkedIn.
How we use HipChat to keep the (distributed) UserVoice team in sync. When programming in packs what is the proper etiquette to use over the hive mind that is created by instantaneous always on chat programs? Richard White with a lot of good advice: keep separate rooms for functional teams; have quiet days; use it to run standup meetings; send out build and deployment notifications; limit use of @all; limit use of notifcations; it's OK to turn it off; avoid cliques; and lots more. Also, Devops Culture (Part 2).
Google App Engine now has search. We don't know what it will cost yet. In a pay for what you use world search will likely be spendy. There's more storage for indexes, more read and write operations, and more instances. But it's a much needed feature and impossible to implement well outside the system.
Application Performance and Antipatterns. Munish Gupta distills down the root causes of application performance issues: Excessive Layering, Round Tripping, Overstuffed Session, Golden Hammer (Everything is a Service), Chatty Services
A Layer of Indirection: Is MPLS Tunneling? Packet Pushers tackles the age old problem of how many protocol headers can dance on the head of a pin?
Notes on graph data management. Curt Monash with a tight introduction to graph databases.
PeteSearch with five tasty short links.

Stuff The Internet Says On Scalability For May 11, 2012

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale