Stuff The Internet Says On Scalability For July 19, 2013

Hey, it's HighScalability time:

(Still not a transporter: Looping at 685 mph)

  • 898 exabytes: US storage, 1/3 global total; 1 Kb/s: data transmit rate from harvestable energy from human motion
  • Create your own trust nobody point-to-point private cloud. Dan Brown shows how step-by-step in How I Created My Own Personal Cloud Using BitTorrent Sync, Owncloud, and Raspberry Pi. BitTorrent Sync is used to copy large files around. Raspberry Pi is a cheap low power always on device with BitTorrent Sync installed. Owncloud is an open source cloud that provides a web interface for file access files from anywhere.
  • This is different. Funding a startup using Airbnb as a source of start-up capital. It beats getting a part-time job and one of your guests might even be a VC.
  • This is not different. Old industries clawing and digging in, using the tools of power to beat back competition. Steve Blank details a familiar story in Strangling Innovation: Tesla versus “Rent Seekers”. The thing is nobody really wants to compete, they want to make money, which has a completely different metaphysics.
  • By supporting old uncompetitive industries we suffocate the job producing engines which are new startup companies. Paul Graham in Startup Investing Trends points out the dilemma: One thing we can say for sure is that there will be a lot more startups. The monolithic, hierarchical companies of the mid 20th century are being replaced by networks of smaller companies. This process is not just something happening now in Silicon Valley. It started decades ago, and it's happening as far afield as the car industry. It has a long way to run.
  • Google App Engine with some cool new features: dedicated memcache, git push-to-deploy support, splitting up large apps into components using modules.
  • Tim Gross has some very high quality posts. The last two, Session Store Design and Falling In and Out of Love With DynamoDB II are particularly good. If you've researched session best practices you know all the conflicting ideas out there. Tim creates some excellent diagrams showing the logic for client-side and server-side sessions. His DynamoDB posts are required reading for anyone contemplating the use of DynamoDB. As in the session post there are some wonderful diagrams explaining how things work. Some takeaways: poor key design == cost & pain; batch write with high concurrency to improve throughput; use estimation and active monitoring to reduce costs.
  • Heroku's Logplex: a distributed syslog log router, able to merge and redistribute multiple incoming streams of syslog logs to individual subscribers.
  • We are in the middle of a critical platform transition says Bill Gurley. The implications: design means more, simpler experiences, greater development complexity, HTML5 is a head-fake, SEO doesn't matter, apps replace search, a locked-in mobile user is more valuable than a desktop user, nobody knows how to acquire customers, payment is the new battleground.
  • Risk in IT Systems by Dmitriy Samovskiy: Do you know what our biggest problem in IT is? It’s our complete and utter inability to measure risk of the systems and services that we build. Often, people don’t even understand what the risk is and confuse risk with other concepts. Risk corresponds to standard deviation of monthly amounts of failure, not their mean (standard deviation is square root of variance).
  • Evernote has a great post on Doubling SSL Keys to 2048 Bits. Big changes can't have a big bang release. They performed a lot of incremental testing to verify their infrastructure could handle the increased privacy. Their transition plan migh also work for you. Hardware SSL offload machines are used to minimize the impact. Collectd and graphite are used to measure system and network performance internally.Pingdom and Thousand Eyes to give us statistics from a remote point of view. They measured: Load balancer CPU and memory utilization, SSL transaction rates, or the number of new SSL handshakes per second. SSL concurrency, or the number of SSL connections active at the same time, SSL request latency, or the time it takes to negotiate a new SSL handshake.
  • How Does a Transistor Work? Short video that does a good job explaining a complicated subject. They also have a huge series of videos on topics physical that are well worth watching.
  • How do you build a following? peteforde: "The Perfect Store" is a book about the early days of eBay. The primary takeaway for me was how they deliberately went to swap meets, flea markets and garage sales all over America — especially the rural flyover states — and talked to people. They identified the key influencers and flew many of them to California to be given VIP treatment. Those folks returned to their communities as true believers and encouraged their flock to get on the train. 15 years later that investment paid off more than any of them could have hoped.
  • Maybe you can have it all. Peter Bailis on Non-blocking transactional atomicity: multi-versioning and some extra metadata to ensure transactional atomicity without the use of locks. Specifically, our solution does not block readers or writers in the event of arbitrary process failure and, as long as readers and writers can contact a server for each data item they want to access, the system can guarantee transactional atomicity of both reads or writes. At a high level, the key idea is to avoid performing in-place updates and to use additional metadata to substitute for synchronous synchronization across replicas. Good discussion On Hacker News.
  • AWS Redshift: How Amazon Changed The Game. Excellent detail on the factors to consider when thinking about your BigData options. Non trivial tests showed Redshift worked and was a good value, but of more value is that they show queries and the numbers. Loading saw linear scaling per dollar. On queries there was almost linear scaling.
  • Looks interesting if you are implementing social networking functionality.  post something and my followers see it. That's the rough idea behind the pump.
  • Java Garbage Collection Distilled. Martin Thompson attempts the impossible: explaining the tradeoffs when choosing and tuning garbage collection algorithms for a particular workload. There's a lot to it and he does a really good job. And it beats burning your eyes reading arcane web pages and randomly trying different flag combinations.
  • Crazy lessons from GoDaddy:  there is an inherent drawback to using a battery-backed write cache: Many RAID controllers, like our Dell PERC cards, go through a battery learning cycle which calibrates the capacity of the battery to ensure it does not unexpectedly fail. For us, this cycle occurs every 90 days. When a battery learning cycle begins, it fully charges, discharges, and then charges again, realigning the true capacity of the battery. While it performs the learning process, you cannot rely on it to sustain the cache in the event of a power failure.
  • Sometimes best practices are easy enough to do from the start that you can just do them. David Crawford recommends Using Elastic IPs on EC2 from the Start: In the three months we’ve been using EC2, two of our boxes have been “retired."  It sounds as if this is happening more frequently, according to a blog post by  And while Amazon tries to make it as painless as possible, if you’re not using elastic IPs, you can get yourself in trouble.
  • 50% of SQL performance problems are index problems says Markus Winand in Indexes: The neglected performance all-rounder. DBAs shouldn't be creating the indexing because they don't know the code. Development should be the ones designing indexes, not just creating them.
  • OpenReplica: is an object replication service for providing reliability and fault-tolerance in distributed systems. It is designed to maintain long-lived, critical state (such as configuration information) and to synchronize distributed components.
  • MC2 : High-Performance Garbage Collection for Memory-Constrained Environments: . MC2 has low space overhead and tight space bounds, prevents fragmentation, provides good throughput, and yields short pause times. These qualities make MC2 also attractive for other environments, including desktop and server systems.