Oculus Causes a Rift, but the Facebook Deal Will Avoid a Scaling Crisis for Virtual Reality

Facebook has been teasing us. While many of their recent acquisitions have been surprising, shocking is the only word adequately describing Facebook's 5 day whirlwind acquisition of Oculus, immersive virtual reality visionaries, for a now paltry sounding $2 billion.

The backlash is a pandemic, jumping across social networks with the speed only a meme powered by the directly unaffected can generate.

For more than 30 years VR has been the dream burning in the heart of every science fiction fan. Now that this future might finally be here, Facebook’s ownage makes it seem like a wonderful and hopeful timeline has been choked off, killing the Metaverse before it even had a chance to begin.

For the many who voted for an open future with their Kickstarter dollars, there’s a deep and personal sense of betrayal, despite Facebook’s promise to leave Oculus alone. The intensity of the reaction is because Oculus matters to people. It's new, it's different, it creates a better future. It's important in a way sending messages or taking pictures never can be.

Let’s use Andy Baio as a sane example of the loyal opposition:

I can palpably feel the oxygen sucked out of the room. Infinite bright possibilities from indie game devs, quietly shelved.

Jigsus backs this up on reddit:

Within the last hour EVERY friend I know was developing a rift game has canceled. That's around 11 projects just gone.

So WTF? Why in this reality would Oculus sell to Facebook? At first blush it makes little sense. A more mismatched couple could hardly be created in a fantasy immersive 3D game.

Yet there's a reason and a method here that's not only about about a sweet payoff. When you look deeper at the Facebook deal it’s about Oculus' existential need to scale. And scale is something Facebook is very, very good at.

Let’s explore why this deal makes more sense for Oculus than it might first appear and the central role scalability plays in the decision making...

Click to read more ...


Big, Small, Hot or Cold - Examples of Robust Data Pipelines from Stripe, Tapad, Etsy and Square

This is a guest repost by Pete Soderling, Founder at Hakka Labs, creating a community where software engineers come to grow.

In response to a recent post from MongoHQ entitled “You don’t have big data," I would generally agree with many of the author’s points.

However, regardless of whether you call it big data, small data, hot data or cold data - we are all in a position to admit that *more* data is here to stay - and that’s due to many different factors.

Perhaps primarily, as the article mentions, this is due to the decreasing cost of storage over time. Other factors include access to open APIs, the sheer volume of ever-increasing consumer activity online, as well as a plethora of other incentives that are developing (mostly) behind the scenes as companies “share” data with each other. (You know they do this, right?)

But one of the most important things I’ve learned over the past couple of years is that it’s crucial for forward thinking companies to start to design more robust data pipelines in order to collect, aggregate and process their ever-increasing volumes of data. The main reason for this is to be able to tee up the data in a consistent way for the seemingly-magical quant-like operations that infer relationships between the data that would have otherwise surely gone unnoticed - ingeniously described in the referenced article as correctly “determining the nature of needles from a needle-stack.”

But this raises the question - what are the characteristics of a well-designed data pipeline? Can’t you just throw all your data in Hadoop and call it a day?

As many engineers are discovering - the answer is a resounding "no!" We've rounded up four examples from smart engineers at Stripe, Tapad, Etsy & Square that show aspects of some real-world data pipelines you'll actually see in the wild.

How does Stripe do it?

Click to read more ...


Stuff The Internet Says On Scalability For March 21st, 2014

Hey, it's HighScalability time:

  • Quotable Quotes:
    • Chris Anderson: Petabytes allow us to say: ‘Correlation is enough.’
    • @adron: Back to writing the micro-services for my composite app with regionally distributed, highly available, key value crypto stores of unicorns!
    • DevOps Cafe: when the canary dies you don't buy a stronger canary.
    • Mark Pagel: Creativity, like evolution, is merely a series of thefts.
    • GitHub: Whoever has more capacity wins.
    • The Master: We balance probabilities and choose the most likely. It is the scientific use of the imagination.
    • Jonathan Ive: Steve, and I don’t recognize my friend in much of it. Yes, he had a surgically precise opinion. Yes, it could sting. Yes, he constantly questioned. ‘Is this good enough? Is this right?’ but he was so clever. His ideas were bold and magnificent. They could suck the air from the room. And when the ideas didn’t come, he decided to believe we would eventually make something great. And, oh, the joy of getting there!”
    • @joestump: Turns out the correct amount of RAM is always 16GB more than you have.

  • From a spec of sand does a pearl grow. In this case the oyster is Facebook, the irritant is PHP, and the pearl is Hack, a new improved version of PHP developed by Facebook. Hack now runs all of Facebook's front-end code. Lots of great comments on Hacker News and on Reddit with much lang on PHP hate. Realize Facebook is not trying to build the next perfect language. Historically they coded their web tier in PHP so they have a lot of code working in production. It would be insane to throw that all away. Over the years they've put a lot of work into making PHP more efficient. Hack continues that work by making PHP a better language, coupling PHP's convenient interpreted nature with static typing and many other features commonly found in modern programming languages. If PHP bothers you, Hack is not for you, but that's OK.

  • Dish Jump-Starts the Hopper With 8 Tuners and PS4 Support. Ultimate in local caching. Record all the things and then watch what you want later. Streaming access sucks. More local storage!

  • Cassandra Hits One Million Writes Per Second on Google Compute Engine on 300 VMs with 300 1TB Persistent Disk volumes, 15,000 clients, median latency of 10.3 ms and 95% completing under 23 ms. It took 1 hour and 10 minutes at a cost of $330 USD. All benchmark caveats apply. They wrote small records of 170 bytes. What happens when you write a large range of data sizes? When you are reading? Are you running queries? Are you backing up? Are indexes being updated? Is there contention? Etc, etc. Still, it's impressive.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...


Paper: Log-structured Memory for DRAM-based Storage - High Memory Utilization Plus High Performance

Most every programmer who gets sucked into deep performance analysis for long running processes eventually realizes memory allocation is the heart of evil at the center of many of their problems. So you replace malloc with something less worse. Or you tune your garbage collector like a fine ukulele. But there's a smarter approach brought to you from the folks at RAMCloud, a Stanford University production, which is a large scale, distributed, in-memory key-value database.

What they've found is that typical memory management approaches don't work and using a log structured approach yields massive benefits:

Performance measurements of log-structured memory in RAMCloud show that it enables high client through- put at 80-90% memory utilization, even with artificially stressful workloads. In the most stressful workload, a single RAMCloud server can support 270,000-410,000 durable 100-byte writes per second at 90% memory utilization. The two-level approach to cleaning improves performance by up to 6x over a single-level approach at high memory utilization, and reduces disk bandwidth overhead by 7-87x for medium-sized objects (1 to 10 KB). Parallel cleaning effectively hides the cost of cleaning: an active cleaner adds only about 2% to the latency of typical client write requests.

And for your edification they've written an excellent paper on their findings. Log-structured Memory for DRAM-based Storage:

Click to read more ...


Strategy: Three Techniques to Survive Traffic Surges by Quickly Scaling Your Site

Matthew Might, as a first responder to a surprise traffic surge on his inexpensive linode hosted blog, took emergency steps that you might find useful in a similar situation:

  1. Find the bottleneck. Reloading the page in firebug showed the first page took 24 seconds to load and after that everything else loaded quickly. In retrospect this burst meant the site was thread limited as the CPU was idle.
  2. Cut image sizes in half with a shell script using ImageMagick's convert. Load time is now 12 seconds.
  3. Turn dynamic content into static content using a static index.html file  copied using the browser's "view source" feature. Load time is now 6 seconds.
  4. Added threads to the Apache configuration file. Load time is now 2 seconds. Crises averted.

Because of this quick thinking and quick action the patient survived to serve pages another day.

And in fine post-mortem tradition some of the future changes are: 

Click to read more ...


Sponsored Post: Airseed, Uber, ScaleOut Software, Couchbase, Tokutek, Logentries, Booking, Apple, MongoDB, BlueStripe, AiScaler, Aerospike, LogicMonitor, AppDynamics, ManageEngine, Site24x7  


Who's Hiring?

  • Apple is hiring for multiple positions. Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly.  
    • Senior Engineer. We are looking for a team player with focus on designing and developing WWDR’s web-based applications. The successful candidate must have the ability to take minimal business requirements and work pro-actively with cross functional teams to obtain clear objectives that drive projects forward to completion. Please apply here.
    • Software Engineer. We are looking for a team player with focus on designing and developing WWDR’s web-based applications. The successful candidate must have the ability to take minimal business requirements and work pro-actively with cross functional teams to obtain clear objectives that drive projects forward to completion. Please apply here.
    • Quality Assurance Engineer. The iOS Systems team is looking for a Quality Assurance engineer. In this role you will be expected to work hand-in-hand with the software engineering team to find and diagnose software defects. Please apply here.

  • Airseed -- a Google Ventures backed, developer platform that powers single sign-on authentication, rich consumer data, and analytics -- is hiring lead backend and fullstack engineers (employees #4, 5, 6). More info here:

  • Join the team that scales Uber supply globally! Our supply engineering team is responsible for prototyping, building, and maintaining the partner-facing platform. We're looking for experienced back-end developers who care about developing highly scalable services. Apply at

  • We need awesome people @ - We want YOU! Come design next
    generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Apply:

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • The Biggest MongoDB Event Ever Is On. Will You Be There? Join us in New York City June 23-25 for MongoDB World! The conference lineup includes Amazon CTO Werner Vogels and Cloudera Co-Founder Mike Olson for keynote addresses.  You’ll walk away with everything you need to know to build and manage modern applications. Register before April 4 to take advantage of super early bird pricing.

  • How to Scale MySQL for Big Data Applications. A Guide to Evaluating TokuDB on March 20th at 1pm ET. You can do more than you think with the MySQL you already have. Learn how to use MySQL or MariaDB in Big Data applications by simply upgrading the storage engine with TokuDB. Register now.

  • April 3 Webinar: The BlueKai Playbook for Scaling to 10 Trillion Transactions a Month. As the industry’s largest online data exchange, BlueKai knows a thing or two about pushing the limits of scale. Find out how they are processing up to 10 trillion transactions per month from Vice President of Data Delivery, Ted Wallace. Register today.

Cool Products and Services

  • As one of the fastest growing VoIP services in the world Viber has replaced MongoDB with Couchbase Server, supporting 100,000+ operations per second in the short term and 1,000,000+ operations per second in the long term for their third generation architecture.  See the full story on the Viber switch.

  • Do Continuous MapReduce on Live Data? ScaleOut Software's hServer was built to let you hold your daily business data in-memory, update it as it changes, and concurrently run continuous MapReduce tasks on it to analyze it in real-time. We call this "stateful" analysis. To learn more check out hServer.

  • Log management made easy with Logentries Billions of log events analyzed every day to unlock insights from the log data the matters to you. Simply powerful search, tagging, alerts, live tail and more for all of your log data. Automated AWS log collection and analytics, including CloudWatch events. 

  • LogicMonitor is the cloud-based IT performance monitoring solution that enables companies to easily and cost-effectively monitor their entire IT infrastructure stack – storage, servers, networks, applications, virtualization, and websites – from the cloud. No firewall changes needed - start monitoring in only 15 minutes utilizing customized dashboards, trending graphs & alerting.

  • BlueStripe FactFinder Express is the ultimate tool for server monitoring and solving performance problems. Monitor URL response times and see if the problem is the application, a back-end call, a disk, or OS resources.

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Cloud deployable. Free instant trial, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...


Intuitively Showing How To Scale a Web Application Using a Coffee Shop as an Example

This is a guest repost by Sriram Devadas, Engineer at Vistaprint, Web platform group. A fun and well written analogy of how to scale web applications using a familiar coffee shop as an example. No coffee was harmed during the making of this post.

I own a small coffee shop.

My expense is proportional to resources
100 square feet of built up area with utilities, 1 barista, 1 cup coffee maker.

My shop's capacity
Serves 1 customer at a time, takes 3 minutes to brew a cup of coffee, a total of 5 minutes to serve a customer.

Since my barista works without breaks and the German made coffee maker never breaks down,
my shop's maximum throughput = 12 customers per hour.


Web server

Customers walk away during peak hours. We only serve one customer at a time. There is no space to wait.

I upgrade shop. My new setup is better!

Same area and utilities, 3 baristas, 2 cup coffee maker, 2 chairs

3 minutes to brew 2 cups of coffee, ~7 minutes to serve 3 concurrent customers, 2 additional customers can wait in queue on chairs.

Concurrent customers = 3, Customer capacity = 5


Scaling vertically

Business is booming. Time to upgrade again. Bigger is better!...

Click to read more ...


Stuff The Internet Says On Scalability For March 14th, 2014

Hey, it's HighScalability time:

  • Quotable Quotes:
    • The Master Switch: History shows a typical progression of information technologies: from somebody’s hobby to somebody’s industry; from jury-rigged contraption to slick production marvel; from a freely accessible channel to one strictly controlled by a single corporation or cartel—from open to closed system.
    • @adrianco: #qconlondon @russmiles on PaaS "As old as I am, a leaky abstraction would be awful..."
    • @Obdurodon: "Scaling is hard.  Let's make excuses."
    • @TomRoyce: @jeffjarvis the rot is deep... The New Jersey pols just used Tesla to shake down the car dealers.
    • @CompSciFact: "The cheapest, fastest and most reliable components of a computer system are those that aren't there." -- Gordon Bell
    • @glyph: “Eventually consistent” is just another way to say “not consistent right now”.
    • @nutshell: LinkedIn is shutting down access to their APIs for CRMs (unless you’re Salesforce or Microsoft). Support open APIs!
    • Tim Berners-Lee: I never expected all these cats.
    • @muratdemirbas: "Simple clear purpose&principles give rise to complex&intelligent behavior.
      Complex rules&regulations give rise to simple&stupid behavior."
    • @BonzoESC: “Duplication is far cheaper than the wrong abstraction.” @sandimetz @rbonales 
    • @BenedictEvans: Umeng: there are 700m active smartphones and tablets in China.

  • Scale matters object lesson number infinity: HBO Go Crashes During True Detective Finale. Perhaps make HBO Go available without a cable package and maybe you'll have money to scale the service? Think peak. But wait, Dan Rayburn says bandwidth was not the problem, it's other parts of the system, which is why Internet TV will never be as reliable as broadcast TV. Still, I'd like to cut the cord.

  • Turns out ecommerce over messaging works well...quite well. Retailers Are Striking Gold with Instagram: Fox and Fawn, items often sell out within minutes of the picture being posted on Instagram.

  • Even Facebook's infrastructure struggles when a new feature becomes an unexpected hit. That's the situation described in an engaging story: Looking back on “Look Back” videos. Look Back's are one minute videos generated from a user's pics and posts. For the release they planned on 187 Gbps more bandwidth and 25 petabytes of disk. To get the rendering done they highly parallelized the pipeline. CDNs were alerted. Internal tests on employees found a few bugs. Less storage was actually needed because the video could be regenerated so a high replication factor wasn't needed. Go time! The videos were an unexpected hit with a 40% reshare instead of the projected 10% reshare. It seems people like themselves...a lot. Overnight 30 teams cooperated to move tens of thousands of machines over to rendering. Good story. Though I'm disappointed it didn't have its own Look Back video. Stories are people too.

  • Why Google's services aren't really free. We all help train the beast. A Glimpse of Google, NASA & Peter Norvig + The Restaurant at the End of the Universe: Algorithms behave differently as they churn thru more data. For example in the figure, the Blue algorithm was better with a million training dataset. If one had stopped at that scale, one would think of optimizing that algorithm for better performance. But as the scale increased the purple algorithm started showing promise – in fact the blue one starts deteriorating at larger scale; In general, Google prefers algorithms that get better with data. Not all algorithms are like that, but Google likes to after the ones with this type of performance characteristic. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...


Paper: Scalable Eventually Consistent Counters over Unreliable Networks

Counting at scale in a distributed environment is surprisingly hard. And it's a subject we've covered before in various ways: Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory, How to update video views count effectively?, Numbers Everyone Should Know (sharded counters).

Kellabyte (which is an excellent blog) in Scalable Eventually Consistent Counters talks about how the Cassandra counter implementation scores well on the scalability and high availability front, but in so doing has "over and under counting problem in partitioned environments."

Which is often fine. But if you want more accuracy there's a PN-counter, which is a CRDT (convergent replicated data type) where "all the changes made to a counter on each node rather than storing and modifying a single value so that you can merge all the values into the proper final value. Of course the trade-off here is additional storage and processing but there are ways to optimize this."

And there's a paper you can count on that goes into more details: Scalable Eventually Consistent Counters over Unreliable Networks:

Click to read more ...


Douglas Adams - 3 Rules that Describe Our Reactions to Technologies

Chris Dixon unearthed a great quote from Douglas Adams on the nature of technological adoption that unsurprisingly hits the mark in our ever changing and evolving world:

  1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.
  2. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.
  3. Anything invented after you’re thirty-five is against the natural order of things

Some that come to mind: horse to car, index card to online search, PC to mobile, web to app, portal to messaging, Newton to Einstein, oil to electric, rock to rap, Aquinas to Bacon, buying to renting, files to streaming, network TV to cordkilling, broadcast to social, programming CPUs to programming biology, server to cloud, vm to container, wired to wireless, long read to TLDR, privacy to public to ephemeral, paper based news aggregation to digital aggregation, checks to online banking, gold to fiat to bitcoin, linear to exponential growth, large to small teams, to a world that ignores you to a world the responds to you, nation states to who knows what, a military of people to a Military of Things, and weekly versus binge watching. Any others?