Stuff The Internet Says On Scalability For August 5, 2011

Submitted for your beginning of the end of summer scaling pleasure:

  • Google Uses About 900,000 Servers; eBay deploys 100TB of flash storage
  • The cloud isn't for closers. Another gaming startup pulls back from the cloud by Derrick Harris. Digital Chocolate is following the Zynga strategy of moving games into higher performing datacenter infrastructure once it becomes popular enough in the cloud to justify the primo stuff. We talked about this strategy in Zynga's Z Cloud - Scale Fast Or Fail Fast By Merging Private And Public Clouds. An architectural approach made all the more sensible with Amazon's new AWS Direct Connect service, which enables lower latency and higher bandwidth services by skipping the Internet and connecting directly to the AWS network. AWS Direct Connect FAQs. Amazon Virtual Private Cloud
  • Quotes that are quotable:
    • @Werner : "If You Are Slow, You Can't Grow" - Peecho Architecture - scalability on a shoestring http://wv.ly/n4fpPC #aws #highscalability #peecho
    • @tcolar : Using neo4j .... After so many years of RDMS it's almost feels wrong having all that freedom to have an intuitive model. - almost !
    • @robertmclaws : Am I wrong in my opinion that #nosql is just an excuse for script kiddies not to learn how to build scalable systems? Or to not pay 4 MSSQL?
    • @b_erb : Quote of the day: "All computers wait at the same speed" – Dr. Thomas E. Bell
  • Twitter, a prolific 5 star open source contributor, is getting set to release Storm. The theme of Storm is real-time Hadoop style processing: stream processing, continuous computation, distributed RPC. Looks interesting. The post has a really good architecture description as well.
  • Cade Metz reports Apple has removed MySQL from the latest version of Mac OS X server, replacing it with PostgreSQL. If you want to see the future of the Mac, look at the iPad.
  • MySQL cluster: transaction rate of 371,000 insert transactions per second on AWS. Tony Bain discusses What Scales Best? : OldSQL, NoSQL, NewSQL. Consider physical scalability and what it costs in terms of complexity, actual dollars, performance, flexibility, availability, and consistency.
  • Once you eat, pray love, on onion chutney, you will finally understand MapReduce.  How I explained MapReduce to my Wife?
  • While density is increasing in politics, it's also increasing in the white box server market: Supermicro goes 8x server crazy. 80 cores in 5U, 6 SATA ports, 16 hot swap HD bays, and10 full length PCIe slots for real RAID cards. 
  • MySQL performance on EC2/EBS versus RDS by Baron Schwartz. Shard from the beginning if you need a really big database on AWS and cloud database performance won't be great. AWS can work very well when you don’t need high concurrency from your MySQL database, EBS can work well when you do not demand much I/O; more advanced versions of MySQL are available than provided by RDS; you need more memory for you DB than available on Amazon; you need more throughput than EBS; partition so all the inserts go into one partition, whose indexes fit in memory;  put transaction log files on local disks
  • Sebastian Kreutzberger plots database trends: MongoDB very early succeeded over CouchDB. And here is the falling trend of MySQL and similar relational databases.
  • Building a million+ users social network (presentation) by Amir Salihefendic, co-founder of Plurk. Form a good team, localize, differentiate, loop viraly, understand game mechanics. 
  • Are Amazon EC2 Spot Instances - A Flop? Dmitriy Samovskiy thinks Amazon has create a discounted instances market than a true spot market: clearing at equilibrium price. A call to action: create a real spot market, one driven by supply and demand, with more market information than just historical prices published via API. That’s what pioneers do - they critically analyze the past and continue to build fascinating future for all of us.
  • OSCon - Performance vs Scalability by gleicon. A look at performance vs scalability, non-blocking I/O, the right architecture for using it, and message queues, caching, Redis. 
  • Ilya Grigorik with his usual insightful post, this time he compares Protocol Buffers, Avro, Thrift & MessagePack. Diplomatically: If you are looking for a battle-tested, strongly typed serialization format, then Protocol Buffers is a great choice. If you also need a variety of built-in RPC mechanisms, then Thrift is worth investigating. If you are already exchanging or working with JSON, then MessagePack is almost a drop-in optimization. And finally, if you like the strongly typed aspects, but want the flexibility of easy interoperability with dynamic languages, then Avro may be your best bet at this point in time.
  • Kent Langley announces VGBuilder for building custom apps and deploying them to your cloud host of choice -- straight from the command line. It removes the need for any middle man or centralized platform to get your ideas from your brain and into the cloud crazy fast and with quite a bit of style.  It allows you to build and deploy your applications in the cloud with almost no barriers.  It removes the need for PaaS services for many people right from the start. If it's from Kent it's worth taking a look. 
  • Parallax - A New Operating System for Scalable, Distributed and Parallel Computing. A new approach to distributed computing with self-management of nodes and signaling-enabled dynamic network monitoring and management of a set of such nodes to execute managed computational flows. The managed execution of a workflow as a directed acyclic graph using the DIME Network Architecture (DNA) provides at least a mechanism for a lengthy recursive sequence of nested programs to unfold in the von Neumann computing world.
  • Preparing Your Website/Web App For Scalability by Mr Kirkland. Nice explanation of strategies for scaling to hundreds of concurrent users and zillions of hits within days: Think Modular; Queuing/Batching; Partitioning; Code First, Profile and Optimize Later; Cache Cache Cache;  Database Indexes + Query Optimisation; Use a separate database reader and writer; Account For DB Reader Delays;  Separate Server For Media/Static Files; Mail Config; Avoid local file system dependencies – separate your data storage; Efficient Backups; Cloud.
  • More on UnQL in an InfoQ interview with Richard Hipp on UnQL, a New Query Language for Document Databases. UnQL aims to provide a common database query language that can be used to access document-oriented databases from multiple vendors.
  • Marko A. Rodriguez On the Nature of Pipes,  a data flow framework. Cool graphics and a good explanation of how to deal with graph data using transform, filter, and side effect pipes. 
  • Check-in this FourSquare presentation on Data Exploration. 10+ million users, 15+ million venues, 3 million check-ins a day.
  • Ethereal Mind on buildind Low Latency 10 and 40 Gigabit Ethernet End-to-End Solutions.  
  • Tom's Hardware with an epic exploration of SSD failure rates. whysobluepandabear's summary:
    A.) They (SSD and HDD companies) lie to us, and the figures and statistics are not reliable, nor paint an accurate picture of their reliability/performance. B.) The slowest SSD rapes the fastest HDD by a significant margin. C.) SSDs are no more reliable than HDDs - lack of moving parts does not necessarily mean lack of failure.  D.) Failure is a bit misused, as it's a term used to describe the progressive failing of a drive, and not the sudden. E.) Rather than performance, many companies (and consumers) are more concerned about reliability, as like said, even the slowest SSD is MUCH faster than the fastest HDD. 
  • Tomas Vondra with an interesting discussion transactional file systems. Do you want to build a transactional file system on top of a database (that provides the support for transactions), or do you want to build a database on top of a transactional file system? 
  • MongoDB on AWS (RDS-style). Jurg van Vliet with a nice start at approaching RDS for convenience on AWS.
  • Magnetic RAM comes of age by Sebastian Anthony. A team of Spanish and French scientists have finally found a way of reading and writing magnetic memory without using magnetic fields and coils of wire. 
  • EFFICIENT ON-DEMAND OPERATIONS IN LARGE-SCALE INFRASTRUCTURES by  STEVEN Y. 
  • Interesting Neural Network Papers at ICML 2011 by  RichardSocher
  • Videos from Open DB Camp.
  • Doug Judd, who does a great job with Hypertable, is having event that may be of interest: Opportunities for AI in Hypertable to be presented at AAAI 2011 Workshop.