Stuff The Internet Says On Scalability For January 20, 2012

If you’ve got the time, we’ve got the HighScalability:

  • Google+: 90 million users; Internet 2011: 2.1 billion Internet users, 1 trillion YouTube views, 5.9 billion mobile subscriptions; Fusion-io: One Billion IOPS; 12 atoms: size of IBM's new memory bit; 32 Million: Stack monthly visitors; Gmail: 350 Million Users; TimTebow: 1.5 million Tweets
  • Quotable Quotes:
    • Similarity : There is no canonical schema anymore. Instead you should ask: What high-volume queries will I need to serve with my data? Then work backwards from there.
    • @kvirjee : Dis/Agree? -- "there is no problem but scalability, and architecture is its solution"
    • @robpegoraro : Eternal vigilance can be crowdsourced.
  • Didn't Bill Gates say once that 48 bits would always be enough for an ID? Well, Oracle ran out of bits: Fundamental Oracle flaw revealed. 64 bits, that's the ticket, ipv6 went 128 bits. 
  • The day Kodak died: We developed the world's first consumer digital camera but we could not get approval to launch or sell it because of fear of the effects on the film market... a huge opportunity missed.
  • PeteSearch with Five short links that are long on good.
  • StorageMojo asks: If SSDs are so great, shouldn’t we see the results in TPC-C benchmarks? And answers: They are, and we do. The most expensive top-10 SSD result is some 15% cheaper than the least expensive disk-based result
  • Netflix shows how they achieve Auto Scaling in the Amazon Cloud: 1) Scale up early, scale down slowly, 2) Provision for availability zone capacity, 3) Without the proper configuration and testing it can do more harm than good.
  • Data Infrastructure at LinkedIn. LinkedIn is another company doing great innovative infrastructure work and releasing it into the wild. SenseiDB, Voldemort, Espresso, Kafka, and many more. That's just some of the impressive tech behind their 81+ million unique  monthly visitors.
  • Memcached on steroids: A 4KB Get in 12 μs using ConnectX InfiniBand QDR adapters. That's fast BTW.
  • A duo of videos: 1) CMU has an excellent an varied set of computing related videos. 2) Videos from Hadoop World 2011 are now available. MapReduce them while you can. 
  • A duo of databases: 2) nessDB - A very fast key-value,embedded Database Storage Engine (Using log-structured-merge (LSM) trees) with Level-LRU, Bloom-Filter. 2) NuoDB - an ACID, transactional, and elastically scalable client/cloud relational database. 
  • Dan Rayburn has broadcasted a massive List Of Online Video Conferences, Tradeshows and Events.
  • World subway paths at scale from Flowing Data. They are all cool neuron looking things. Or maybe they look like knots? Or spiders? Or cracks in ice?
  • A nice summary of jvm performance tuning by Andrew Wang.
  • A thread on How to store unique visitors in cassandra revealed Countandra,  a hierarchical distributed counting engine on top of Cassandra. Something I've never heard of before, but looks cool. Matt Dennis has some time series data modeling advice. Common tasks are surprisingly hard in a lot of systems.
  • Availability, Data Locality, and Peer-to-Peer ReplicationPeer-to-peer replication is a special and magical kind of replication, it works in a ring or mesh to make sure that one row’s updates will magically spread to all servers. You’d think that this would mean every server is equal, right?