hot links

Stuff The Internet Says On Scalability For February 8, 2013

High Scalability

08 Feb 2013 — 5 min read

Hey, it's HighScalability time:

34TB : storage for GitHub search; 2,880,000,000: log lines per day
Quotable Quotes:
- @peakscale: The "IKEA effect" << Contributes to NIH and why ppl still like IaaS over PaaS. :-\
- @sheeshee: module named kafka.. creates weird & random processes, sends data from here to there & after 3 minutes noone knows what's happening anymore?
- @sometoomany: Ceased writing a talk about cloud computing infrastructure, and data centre power efficiency. Bored myself to death, but saved others.
- Larry Kass on aged bourbon: Where it spent those years is as important has how many years it spent.

Lots of heat on Is MongoDB's fault tolerance broken? Yes it is. No it's not. YES it is. And the score: MongoDB Is Still Broken by Design 5-0.

Every insurgency must recruit from an existing population which is already affiliated elsewhere. For web properties the easiest group to recruit is the younger demographic. They naturally want something different than their elders and they have fewer allegiances to defend. What's your counter insurgency strategy?

Security in a Many Worlds Interpretation of QM is (un)surprisingly funny: Abstruse Goose - RSA-2048.

Did you really believe StackOverflow's Jeff Atwood rode off into the sunset? Me neither. What was he doing? Discourse - A platform for community discussion. Free, open, simple. Anything Jeff does is worth considering. He is very thoughtful and capable. Interesting to note when starting over he dumped the MS stack and went open source with Ruby, Redis, PostgreSQL.

Lessons of the Nginx v Apache slug fest: Apache has been on the wane, losing 100 million hostnames since June 2012, and not because of any resurgence from Microsoft IIS. Apache still claims 55.47 per cent of all active sites, but Nginx is on the rise. The reason is scale.

Understanding Pain by Fernando Cervero. It's wonder inducing to look at monitoring and managing systems in light of the human body: Your brain works, for pain perception and for everything else, by producing images and sensations that best help you to make correct decisions about every aspect of your life. A sensitized nociceptor responds to low-intensity stimuli such as touch, a sensitized synapse will transmit a more intense message than the one it receives, and an altered tactile pathway that is now able to gain access to the pain system will transform a touch stimulus into a pain sensation. In each of these cases, the physical reality of the stimulus is transformed into a perception that differs from the physical reality in magnitude, in quality, or in both magnitude and quality.

Most of the world still says meh to Twitter: The World’s Tweets Light Up the Globe in Stunning Live Visualization

I did not know you could host static websites on Google Drive. Doesn't look like there's CNAME support.

GitHub explains their Recent Code Search Outages. (yea for code search, screw the keys). An Elastisearch upgrade caused some shards to be corrupted and the outage was due to high load during the cluster recovery. Recovery from a second outage found data loss and reloading the data took a long time. A third outage was caused by a release reverting some code back to an older version. Verdict: ideally you need a full staging environment against which you can replay traffic traces and regression tests. A tough standard, but it's really the only way. Plus they give some settings that may be useful if you run large clusters.

If you are looking for opportunities in mobile then Matt Welsh has some ideas for you: My mobile systems research wish list. He wants to know: What should a mobile web platform look like 10 years from now? Where is my data and who has access to it? Why doesn't my phone last all day? Understanding the impact of mobile handoffs on application performance.

Vicious circle: As we have less and less downtime we've become ever more sensitized to downtime so we push for ever better uptime. Or we could just chill out.

Lessons Learned in Concurrency with Ruby – Part I. Excellent example with lots of clean looking Ruby code and useful explanations. Covers memory bloat, memory leaks, threads, thread safety, concurrency vs parallelization.

To store millions of chunks of small data Richard Clayton recommends: Redis with FS persistence - native binary support, better data structures than a pure Key-Value store (sets, lists, ordered sets, etc.), replication, etc. Set it up in 2 minutes, programming in 10 minutes.

Introducing ActorFx, a Cloud Based Actor Runtime - a non-prescriptive, language-independent model of dynamic distributed objects. It's a very sparse model, supporting methods, publish/subscribe, events, and idempotence, but it has potential. Applications are so last century.

If you are considering an analytics database Curt Monash has a few questions to guide you along: How big is your database? How big is your budget? How do you feel about appliances? How do you feel about the cloud? What are the size and shape of your workload? How fresh does the data need to be?

Best Practices + Table Partitioning: Merging Boundary Points: One of the many best practices for SQL Server’s table partitioning feature is to create “extra” empty partitions around your data.

Hat, not CAP: Introducing Highly Available Transactions: Our goal is to push the limits of what is achievable, and, by matching the weak isolation provided by many databases, hopefully provide a familiar programming interface. As I tried to stress in the post, we aren't claiming to "beat CAP" or provide "100% ACID compliance"; we're attempting to strengthen the semantic limits of highly available systems. I intended "HAT, not CAP" as a play on acronyms, not as a claim to achieve the impossible.

Raster-Scan Displays: More Than Meets The Eye. Love these deep dives into areas I know nothing about. How what you see is a product how display technology interacts with your brain is just fascinating. Always good stuff from Valve.

The Power Failure Seen Around the World. James Hamilton explores the implications of the power outage at the Superbowl on design, asking, what would it cost to avoid long game outages?: Tools to mitigate the impact are: 1) avoid the fault entirely, 2) protect against the fault with redundancy, 3) minimize the impact of the fault through small fault zones, and 4) minimize the impact through fast recovery. Conclusion: I would argue it’s time to start retrofitting major sporting venues with more redundant design and employing more aggressive pre-game testing.

John Sloan with a thoughtful exploration of how We Shape Our Tools And Then Our Tools Shape Us. I shudder to think of the shape Eclipse leaves us in.

A well laid out explanation on Apache HBase Internals: Locking and Multiversion Concurrency Control.

Forget Table. I don't remember what this one is about...wait...bitly gives their solution to the problem of storing the recent dynamics of categorical distributions that change over time (ie: non-stationary distributions).

To think these little axonal projections unite the world brain: Submarine cable map details the secret world of the underwater internet.

Scalability lessons from Google, YouTube, Twitter, Amazon, eBay, Facebook and Instagram. Good summary of the major points from the major players.

Interesting question from RightScale: Is using basic services (IaaS) from a Cloud provider more resilient than using IaaS + SaaS? Lessons learned: Don’t Get Seduced by Vendor-Specific Solutions, The Benefit of Using Loosely Coupled Components, Highly Available, Resilient Systems Are the Answer.

An Analysis of Linux Scalability to Many Cores: A speculative conclusion from this analysis is that thereis no scalability reason to give up on traditional operating system organizations just yet.

As always Greg Linden with a diverse and interesting set of Quick Links. Let us all burn digital sage for the day tech companies exploit support as a sales opportunity instead of as a moat and high wall patrolled by bots. You know who you are...

Stuff The Internet Says On Scalability For February 8, 2013

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale