Stuff The Internet Says On Scalability For April 13, 2012

It's HighScalability Time:

  • 50 million in 50 days : Draw Something downloads; 40 million concurrent users : Skype
  • Key to making sensors ubiquitous is getting the BOM cost down. Here's a dream way of making that happen: Bye-Bye Batteries: Radio Waves as a Low-Power Source. “Silicon technology has advanced to the point where even tiny amounts of energy can do useful work.” No batteries == cheaper, smaller products == ubiquity.
  • The MySQL “swap insanity” problem and the effects of the NUMA architecture. Jeremy Cole with a spectacular article on the differences between NUMA and SMP/UMA systems and the mostly unsatisfactory tricks required to get MySQL to perform on NUMA systems. There are really two issues: the evils of an OS controlled swap and NUMA performance effects due to a single node (in the NUMA sense) running out of memory. This is the kind of stuff you only see when you push your systems to the edge. Also, Measuring NUMA effects with the STREAM benchmark, MongoDB on NUMA, and You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part II.
  • Chips as mini Internets: Li-Shiuan Peh, an associate professor of electrical engineering and computer science at MIT, wants cores to communicate the same way computers hooked to the Internet do: by bundling the information they transmit into “packets.” Each core would have its own router, which  could send a packet down any of several paths, depending on the condition of the network as a whole.
  • MySQL at Twitter. Twitter has open sourced their version of MySQL that adds more predictable performance and makes life easier for DBAs, including optimizations for NUMA and SSD. It's those on the edge that lead.
  • Many kinds of memory-centric data management. Excellent overview of the approached used by various in-memory databases by Kurt Monash. Some are too cold, some are too hot, and some are just right.
  • Amazon further extended their service portfolio with an auto-scaling CloudSearch service that has the typically simple Amazon interface, but is not as full featured as dedcated services and may be a little expensive. Some questions: geospacial search support, file type support (PDF, etc), indexing content on S3, CNAME support,  multiple language support, and multi-tenant support. Werner Vogels spells out the technology behind CloudSearch: Amazon CloudSearch is based on more than a decade of developing high quality search technologies for Amazon.com. It has been developed by A9, the Amazon.com subsidiary that focuses on search technologies. The technology that is used at all the different places where you can search on Amazon.com is also at the core of at Amazon CloudSearch.
  • Bloom: Disorderly Programming for a Distributed World. It's imperative we save the world from Von Neumann myopia and make the world safe for multi-core. Towards that goal, here's  a detailed video on Bloom: a programming language targeted at developers of complex cloud computing and distributed systems.  Bloom is a 'disorderly' language: it differentiates itself from most common programming languages by embracing rather than resisting the disorderly realities of distributed computing architectures.
  • Beyond virtualization: Envisioning true cloud computing. A man after my own heart who also things that: eventually we'll start to see widely available OS builds that dispense with a vast majority of today's underpinnings, excising years of fat and bloat in favor of kernels that are highly specific to the hypervisor in use, and with different ideas on how memory, CPU, and I/O resources are worked and managed.
  • The Sad State of Data Center Networking. Cloud Toad don't slow that roll: This is nonsense. What we need is a consistent end-to-end approach to DC networking.  Why buy a snazzy high-powered fabric that interfaces with a different network (the vSwitch and its overlays) via VLANs and talks to the hypervisor via APIs?   Why not  have the routers participate in the overlays, and ditch the VMware API integration with the DC fabric?
  • Gentle overview of Relational Lattice: General lattices are weaker structures than Boolean algebras. Relational lattices are positioned somewhere in-between. In the next section we’ll rehash the key Boolean Algebra ideas and cast them into a shape convenient for moving on to Relational Lattices.
  • ACID in HBase. A nice explanation of how HBase handles transactions: Base employs a kind of MVCC. And HBase has no mixed read/write transactions.
  • Greg Linden with another excellent set of quick links.