Hot Scalability Links for July 30, 2010

  • Jeremy Zawodny, while performing data alchemy in the dungeons of Craigslist, stored 1,250,000,000 Key/Value Pairs in Redis on a 32GB Machine.
  • Data sorting world record: 1 terabyte, 1 minute. The system has 52 computer nodes, each node is a commodity server with two quad-core processors, 24 gigabytes (GB) memory and sixteen 500 GB disks. It's not just hardware though, they also built a software that utilized all their CPU and RAM.
  • Tweets of Gold:
    • wm: I am really getting the sense that none of you yokels waxing profound about scalability actually has anything factual to say
    • joestump: I think you can do things to *mitigate* pain points up front. You don't need to over-engineer, but it's not hard to look forward.
    • danielcrenna: I love it when I check in debug code accidentally and it turns into a three day hunt for a major scalability problem
    • joestump: Your post also makes me think of another phrase I say often: Scaling == Specialization. Bigger scale = More specialization.

Click to read more ...


YeSQL: An Overview of the Various Query Semantics in the Post Only-SQL World

The NoSQL movement faults the SQL query language as the source of many of the scalability issues that we face today with traditional database approach.

I think that the main reason so many people have come to see SQL as the source of all evil is the fact that, traditionally, the query language was burned into the database implementation. So by saying NoSQL you basically say "No" to the traditional non-scalable RDBMS implementations.

This view has brought on a flood of alternative query languages, each aiming to solve a different aspect that is missing in the traditional SQL query approach, such as a document model, or that provides a simpler approach, such as Key/Value query.

Most of the people I speak with seem fairly confused on this subject, and tend to use query semantics and architecture interchangeably. In Part I of this post i tried to provide quick overview of what each query term stands for in the context of the NoSQL world . Part II illustrates those ideas using  code examples from GigaSpaces and Datanucleus/Hbase.

See  Part I , Part II for more information..

Click to read more ...


A Metric A$$-Ton of Joe Stump: The Cloud is Cheaper than Bare Metal

Should you pay more in the cloud or pay less for bare metal in the datacenter? This is a crucial decision point facing startups today. Which way should you go? In this interview, Joe Stump, always a go-to guy when you need a metric ass-ton (a favorite expression of Joe’s) of good advice on cutting edge practices for the modern startup, laughs at conventional wisdom by saying the cloud is really not more expensive than bare metal.

The argument for a cheaper cloud has a three main points:

Click to read more ...


Sponsored Post: Okta, EzRez, VoltDB, Digg, Cloud Sigma, Applications Manager, Site24x7

Who's Hiring?

Cool Products and Services

  • Cloud Sigma. Instantly scalable European cloud servers. 
  • ManageEngine Applications Manager.  ManageEngine provides Enterprise IT Management suite of products. 
  • Site24x7Easy, fast and effective web server monitoring, server monitoring and website monitoring service.

Click to read more ...


4 New Podcasts for Scalable Summertime Reading

It's trendy today to say "I don't read blogs anymore, I just let the random chance of my social network guide me to new and interesting content." #fail. While someone says this I imagine them flicking their hair back in a "I can't be bothered with true understanding" disdain. And where does random chance get its content? From people like these. So: support your local blog!

If you would like to be a part of random chance, here are a few new podcasts/blogs/vidcasts that you may not know about and that I've found interesting:

  • DevOps Cafe. With this new video series where John and Damon visit high performing companies and record an insider's tour of the tools and processes those companies are using to solve their DevOps problems, DevOps is a profession that finally seems to be realizing their own value. In the first episode John Paul Ramirez takes the crew on a tour of Shopzilla's application lifecycle metrics and dashboard. The second episode feature John Allspaw, VP of Technical Operations at Etsy, talking about the new role of DevOps in companies. Only more good stuff from there.
  • Packet Pushers. A great podcast by real experts on seriously technical networking issues. They describe their podcast as: a podcast where we talk about routing, switching, security, firewalls, study and market changes. Some topics  covered: “Defense in Depth” and what it really means; Deep Diving on Data Centre Switching; Chewing on DDOS; Enterprise MPLS; Career Progression.
  • Click to read more ...


How can we spark the movement of research out of the Ivory Tower and into production?

Over the years I've read a lot of research papers looking for better ways of doing things. Sometimes I find ideas I can use, but more often than not I come up empty. The problem is there are very few good papers. And by good I mean: can a reasonably intelligent person read a paper and turn it into something useful? 

Now, clearly I'm not an academic and clearly I'm no genius, I'm just an everyday programmer searching for leverage, and as a common specimen of the species I've often thought how much better our industry would be if we could simply move research from academia into production with some sort of self-conscious professionalism. Currently the process is horribly hit or miss. And this problem extends equally to companies with research divisions that often do very little to help front-line developers succeed. 

How many ideas break out of academia into industry in computer science? We have many brilliant examples: encryption, microprocessors, compression, transactions, distributed file systems, vector clocks, gossip protocols, MapReduce, search, algorithms, networking, communication, and on ad infinitum. For every Google that breaks out there must be thousands of other potential ideas that go nowhere, even in this hyper-VC aware age. 

We need to do is a better job of using the research. There's a lot out there in the literature that we could be making use of right now, but it's closed off from the people, i.e., developers, who can turn this research into gold. And it's largely closed off because researchers don't consider developers as an audience and they don't write their papers with the intention of being applied. Change the publication process and we can save the cheerleader and save the world.

I'm bringing this up now because:

Click to read more ...


Strategy: Consider When a Service Starts Billing in Your Algorithm Cost

At Monday's Cloud Computing Meetup, Paco Nathan gave an excellent Getting Started on Hadoop talk (slides). I found one of Paco's strategies particularly interesting: consider when a service starts charging in cost calculations. Depending on your use case it may be cheaper to go with a more expensive service that charges only for work accomplished rather than charging for both work + startup time.

Click to read more ...


Sponsored Post: ezRez, VoltDB and Digg are Hiring


Hot Scalability Links for July 17, 2010

And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning!

  • Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid  feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again?
  • Tweets of Gold:
    • jamesurquhart: Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features.
    • tjake:  People who choose their datastore based oh hearsay and not their own evaluation are doomed.
    • b6n: No global lock ever goes unpunished
    • MichaelSurtees: scalability, systems & process feed each other right?
    • jamesgolick: Statements like: "NoSQL database systems are designed for scalability." make me sad.
    • agastiya: Focus on stability and features first, scalability and manageability second, per-unit performance last of all. This is a quote from Jeff Darcy

    Click to read more ...


DynaTrace's Top 10 Performance Problems taken from Zappos, Monster, Thomson and Co

DynaTrace in Top 10 Performance Problems taken from Zappos, Monster, Thomson and Co, has provided a useful compilation of performance problems, with potential solutions, that they've found while working with their clients. 

  1. Too Many Database Calls -  too many database query per request/transaction.
  2. Synchronized to Death - in a high-load or production environment over-synchronization results in severe performance and scalability problems.
  3. Too chatty on the remoting channels - too many calls across these remoting boundaries and in the end causes performance and scalability problems.
  4. Wrong usage of O/R-Mappers - incorrect usage of the framework itself too often results in unexpected performance and scalability problems within these frameworks.
  5. Memory Leaks - GC does not prevent memory leaks, it is important to release object references as soon as they are no longer needed.

Click to read more ...