Hot Scalability Links for July 30, 2010
- Jeremy Zawodny, while performing data alchemy in the dungeons of Craigslist, stored 1,250,000,000 Key/Value Pairs in Redis on a 32GB Machine.
- Data sorting world record: 1 terabyte, 1 minute. The system has 52 computer nodes, each node is a commodity server with two quad-core processors, 24 gigabytes (GB) memory and sixteen 500 GB disks. It's not just hardware though, they also built a software that utilized all their CPU and RAM.
- Tweets of Gold:
- wm: I am really getting the sense that none of you yokels waxing profound about scalability actually has anything factual to say
- joestump: I think you can do things to *mitigate* pain points up front. You don't need to over-engineer, but it's not hard to look forward.
- danielcrenna: I love it when I check in debug code accidentally and it turns into a three day hunt for a major scalability problem
- joestump: Your post also makes me think of another phrase I say often: Scaling == Specialization. Bigger scale = More specialization.
- Quora: What are the scaling issues to keep in mind while developing a social network feed? Very good discussion of scaling advice: denormalize; cache; SSD; optimize writes; avoid stupid things.
- Node and Scaling in the Small vs Scaling in the Large by Alex Payne. Each and every problem has an appropriate set of applicable technologies, and it’s up to the engineer to justify their use...In a system of no significant scale, basically anything works...In a system of significant scale, there is no magic bullet.
- Caching could be the last thing you want to do by Morgan Tocker. I think with great tools like memcached it is easy to get carried away and use it as the mallet for every performance problem, but in many cases it should not be your first choice. Caching should be seen more as a burden that many applications just can’t live without. You don’t want that burden until you have exhausted all other easily reachable optimizations.
- NVIDIA Serves Up Reality, GPU-Style. NVIDIA announced that it had partnered with PEER 1 to provide the industry’s first large-scale hosted GPU cloud.
- Raw Notes from Beyond LAMP: Scaling Websites Past MySQL at SXSWi 2010 by William Hertling.
- Damn Cool Algorithms: Levenshtein Automata by Nich Johnson. The basic insight behind Levenshtein automata is that it's possible to construct a Finite state automaton that recognizes exactly the set of strings within a given Levenshtein distance of a target word.
- Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Recommendation by Marko Rodriguez. A sweet 127 slide whirlwind tour of the theory and application of graphs.
- STM, CouchDB, and pushing 5500 docs/sec by David Nolen. On an AWS Cluster Compute Instance I was able to insert a million small documents in about 3 minutes. That's an average of ~5500 documents per second. Not too shabby.
- Unsafe & CompareAndSwap by Cliff Click. Thus for NBHM, (except for non-HotSpot JVMs on Power) there is a volatile-write before any Put and a volatile-read before any Get.
- Vint Cerf: Smart Grid Has to Be Distributed, Voluntary, Collaborative by Katie Fehrenbacher. A smart grid network will be highly dependent on “people’s willingness to connect in this way,” and “this is not going to be something that can be forced on anyone no matter how hard we try.”