Stuff The Internet Says On Scalability For July 22, 2011
Submitted for your scaling pleasure:
- Google's PageRank involves 500 million variables and 2 billion terms. SeaMicro Packs 768 Cores Into its Atom Server. Twitter: 1 Billion Items Delivered A Day Is Nice, Google+. We Do 350 Billion.
- Potent quotables:
- @merv - Does elastic scalability matter? Ask Apple about 1 million e-transactions in a day. $30M worth of multi-gig downloads that must work.
- @lmacvittie - Cloud has shifted the focus of scalability from applications to architecture.
- @bartbohn - Love the phrase "anarchic scalability" by Roy Fielding
- @iAjayMe - Programming: I have come to the conclusion, if you want to do something right the first time (scalability,performance) write it in C++
- @mreferre - Interesting discussion between @jamesurquhart and @lmacvittie about PaaS/App scalability (I think). Need google translator though...
- @krisajenkins - My deal with #nosql: if I'm giving up transactions, it has to buy me no SPOF.
- @c3iq - "Hundreds of gigabytes of data constitute the low end of Hadoop-scale" #bigdata
- Scalability Matters for Web Startup. Jimmy Chen relates a harrowing cautionary tale, don't let this happen to you: One of my really good friends ran a startup in the valley. He is a die-hard lean startup guy. He built his app on PHP and mysql. He prototyped it quickly. He ran it on a linode server. Turned out his app can only handle 300 concurrent users. He then, against his wish, spent 99% of his time trying to scale his application and almost zero time on new feature request. After two months, he decided he needed to take down his app to rewrite it so that it can scale, and for good measures, he added many of the feature requests. He relaunched his app in 3 weeks. However, by then, he lost 90% of the traffic and could never get it back. So the morale of the story? It’s better to take the time to develop an excellent app that can provide uninterrupted service then to quickly and sloppily develop an equally impressive app that needs a major makeover if it gets popular.
- Rackspace is going all the way...with OpenStack. Rackspace will get rid of the previous Cloud Servers code base. "We will be running entirely on OpenStack, and the legacy system will be deprecated and no longer running." That's a solid pledge of commitment. Having a major player like Rackspace really supporting OpenStack should help make it a more attractive option.
- The Difficulty with Deadlocks. Jeremiah Peschka breathes new life into the topic of dead locks, the scourge of multi-threaded programming. Excellent coverage of what they are and how to prevent them.
- How we use Redis at Bump. Will from Bump's server team shares some of their Redis secrets: using lists for message passing; Redis for a centralized log server; sets for social graphs; Redis as an LRU asset cache; they found S3 has too high a latency; compaction locks the servers which reduces response times.
- Network: From Hardware Past To Software Future. Dmitriy Samovskiy has a nice summary of the infrastructure-as-a-code movement. Network gear will go back to moving packets and the software control will go into the application layer. For more on this far reaching topic look at Packet Pushers and Greg Ferro.
- InfoQ with Siddharth “Sid” Anand describing how NoSQL is used at Netflix.
- A short criticism of Amazon SQS. Dowsides are that SQS requires polling and is a high latency service.
- Comparing Apples to Workloads: The Pricing Problem with Cloud Services. Joe Brockmeier takes a great stab at creating a framework for comparing different cloud services using a Workload Allocation Cube, based on measures CPU, Memory, disk I/O, LAN I/O, WAN I/O and storage. This is why delivering software isn't a utility, energy is homogenous and a watt type measurement makes sense. No such thing for computing. It all depends.
- Open source storage array. StorageMojo takes a look at Backblaze's new Storage Pod 2.0 design, a 135-terabyte, 4U server for $7,384: Amazon and Google don’t use NetApp or EMC. Why should you?
- Postmortem: Java App Engine outage, July 14, 2011. No matter how much testing you do, the live environment will always humble you.
- Focus, focus, focus. Google to no longer let a 1000 flowers bloom, Google Labs is dead. That says something about the environment we are in.
- Good news for nginx: nginx is being established as a corporation.
- Hug Hbase Presentation - How Yfrog uses HBase.
- Decomposing Twitter (Database Perspective). It’s a technical review about how Twitter works from the database perspective. It covers 4 major types of Real-Time Data that give Twitter engineers a headache. Also take a look at: Big Data in Real-Time at Twitter by Nick Kallen.
- Geeking with Greg has some links that might be of interest. Google's Indexing the World Wide Web: The Journey So Far was especially interesting.
- Hadoop isn't your only option, Microsoft has Project Daytona. Microsoft Azure Offers Apache Hadoop Alternative, Appeals To SMBs.
- Replication, atomicity and order in distributed systems. Alex Feinberg helps the reader develop a basic toolkit they could use to reason about distributed systems.
- The problem with PaaS is you don't know what the heck is happening. On GAE developers are seeing GAE starting unnecessary instances, are unclear why costs have jumped so much when the CPU usage has stayed the same, and are seeing new charges they didn't see before.
- Voldemort v0.9 released: NIO, pipelined FSM routing layer, hinted handoff and more!
- Architecting for the Cloud: demo and best practices, by Simone Brunozzi. Very nice slide deck.
-
SaaS & Cloud Services: Business Model Scalability Checklist – Part 1. Interested in a SaaS business model? Michael Dunham helps you think about how to make that work: Application architecture is component-based, uses web-services and the database is multi-tenant; Application maintenance and updates can be applied, or rolled back, to all client instances without business interruption and with minimal effort. There are many more.