advertise
« Berkeley DB Architecture - NoSQL Before NoSQL was Cool | Main | A Super Short on the Youporn Stack - 300K QPS and 100 Million Page Views Per Day »
Friday
Feb172012

Stuff The Internet Says On Scalability For February 17, 2012

HighScalability Tested, Cyborg Approved:

  • Google's DNS: 70 billion requests a day; Superexponential: the rate of tech progress;  Akka: 48 cores / 20 million messages a second; 1 minute: intervals for health; Santa Tracker: 1.6 Million Requests per Second; 70x: MySQL cluster performance improvement
  • Quotable Quotes
    • @joeweinman: Lightbody at #ccevent : "Rule #1: architect with price structure in mind
    • Techdirt: Nothing Scales Like Stupidity
    • @brainvat: The IRS of Spain has a columnar database with 100,000 columns and over a trillion rows
  • Zynga is now 20% Amazon and 80% their own cloud, reversing their previous approach of launching in Amazon for the spiky growth phase of a product and then folding it back in when growth rates stabilized. Follow the money...
  • Cost comparisons are like benchmarks, the only sure thing is that nothing is sure, but we usually still learning something anyway. Depending the load profile, bandwidth cost, CPU load, storage needs, etc, self hosting may or may not be more expensive than Amazon. Glad we cleared that up.
  • PacketPushers talks Server Internals and Network Performance. Informative discussion of modern server architectures with an emphasis on IO. Can your server drive a 10 gig network interface? It turns out there is no real way to tell. A look at the North -> South path, NICs, PCI Express lanes, memory bandwidth, lack of standards, the double indirection of virtualization, and servers don't make good switches because of latency of getting work into and out of the CPU and the lack of memory bandwidth to drive all the CPUs. A switch will have a fabric that can link NICs together without hitting the CPU.
  • If the Jane Austen book club is not your cup of Java, then you may prefer the Reading club on Graph databases and distributed systems. There's no Mr. Darcy, but how can that lovely hunk of a character compete with Google Pregel, Map Reduce, and Beehive?
  • BitTorrent Live: Cheap, Real-Time P2P Video Streaming That Will Kill TV.  Has the virtuous scaling property that as something becomes more popular less bandwidth is used per user. It saves bandwidth people, it's not just for bit thieving.
  • Where to Put Flash for Enterprise Performance? Flash in the server is an optimal solution for smaller data sets and highest performance, while flash in the array should be leveraged for larger or more mission-critical data sets.
  • Profiling Django Applications: A Journey From 1300 to 2 Queries. Use an ORM for prototyping. Get your application to work, then profile, then make it faster.
  • Speaking of profiling: Pssst... your Rails application has a secret to tell you. Secrets are learned by eavesdropping on events in Rails and other sources, whispering them to statsd, using Nagios as a tattle tale, and the whole sordid affair is displayed on a dashboard for everyone to see.
  • Scalability of Fork Join Pool. In a 48 core message passing test, context switch issues prevented Akka's message passing from going full speed. The problem was shared locks in the task queue. The solution is described by Wizard Doug Lea in ForkJoin updates. One big change was: treating external submitters in a similar way as workers -- using randomized queuing and stealing. The result was using all 48 cores to  to exchange 20 million messages per second. The actors were not doing any work, but the point to is to keep all the cores humming and impressively, they did. 
  • A Survey of Rollback-Recovery Protocols in Message-Passing Systems: This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into checkpoint-based and log-based. 
  • Dangerous deadly mojo ahead: Deadlocks and Lock Ordering: a Vignette and Deadlocks.
  • Apache Giraph is better than MapReduce on graph problems because: you think like a vertex and  the graph state is kept in memory during the whole of the algorithm. Gone are mappers and reducers, instead a Vertex can send messages to others of its kind.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>