Stuff The Internet Says On Scalability For October 30th, 2015

Hey, it's HighScalability time:

Movie goers Force Crashed websites with record ticket presales. Yoda commented: Do. Or do not. There is no try.
  • $51.5 billion: Apple quarterly revenue; 1,481: distance in light years of a potential Dyson Sphere; $470 billion: size of insurance industry data play; 31,257: computer related documents in a scanned library; $1.2B: dollars lost to business email scams; 46 billion: pixels in largest astronomical image; 27: seconds of distraction after doing anything interesting in a car; 10 billion: transistor SPARC M7 chip; 10K: cost to get a pound in to low earth orbit; $8.2 billion: Microsoft cloud revenue; 

  • Quotable Quotes:
    • @jasongorman: A $trillion industry has been built on the very lucky fact that Tim Berners-Lee never thought "how do I monetise this?"
    • Cade Metz: Sure, the app [WhatsApp] was simple. But it met a real need. And it could serve as a platform for building all sorts of other simple services in places where wireless bandwidth is limited but people are hungry for the sort of instant communication we take for granted here in the US.
    • Adrian Hanft: Brand experts insist that success comes from promoting your unique attributes, but in practice differentiation is less profitable than consolidation.
    • Jim Butcher: It’s a tradition. Were traditions rational, they’d be procedures.
    • Albert Einstein~ Sometimes I pretend I’m the Mayor of my kitchen and veto fish for dinner. ‘Too fishy’ is what I say!
    • @chumulu: “Any company big enough to have a research lab is too big to listen to it" -- Alan Kay
    • Robin Harris: So maybe AWS has all the growth it can handle right now and doesn’t want more visibility. AWS may be less scalable than we’d like to believe.
    • Michael Nielsen: Every finitely realizable physical system can be simulated efficiently and to an arbitrary degree of approximation by a universal model (quantum) computing machine operating by finite means.
    • Sundar Pichai~ there are now more Google mobile searches than desktop searches worldwide.
    • Joe Salvia~ The major advance in the science of construction over the last few decades has been the perfection of tracking and communication.
    • apy: In other words, as far as I can tell docker is replacing people learning how to use their package manager, not changing how software could or should have been deployed.
    • @joelgrus: "Data science is a god-like power." "Right, have you finished munging those CSVs yet?""No, they have time zone data in them!"
    • @swardley: "things are getting worse. Companies are increasingly financialised and spending less on basic research" @MazzucatoM 
    • Dan Rayburn: The cause of what Akamai is seeing is a result of Apple, Microsoft and Facebook moving a larger percentage of their traffic to their in-house delivery networks.
    • @littleidea: containers will not fix your broken architecture you are welcome
    • spawndog: I've typically found the best gameplay optimization comes from a greater amount of creative freedom like you mention. Lets not do it. Lets do it less frequently. Lets organize the data into something relative to usage pattern like spatial partitions.
    • @awealthofcs: The 1800s: I hope I survive my 3 month voyage to deliver a message to London Now: The streaming on this NFL game in London is a bit spotty
    • @ddwoods2: just having buffers ≠ resilience; resilience = the capacities for changing position/size/kind of buffers, before events eat those buffers
    • unoti: There's a dangerous, contagious illness that developers of every generation get that causes them to worry about architecture and getting "street cred" even more than they worry about solving business problems. I've fallen victim to this myself, because street cred is important to me. But it's a trap.
    • @kelseyhightower: Kubernetes is getting some awesome new features: Auto scaling pods, Jobs API (batch), and a new deployment API for serve side app rollouts.

  • Great story on Optimizing League of Legends. The process: Identification: profile the application and identify the worst performing parts; Comprehension: understand what the code is trying to achieve and why it is slow; Iteration: change the code based on step 2 and then re-profile. Repeat until fast enough. Result: memory savings of 750kb and a function that ran one to two milliseconds faster. 

  • Fantastic article on Medium's architecture: 25 million uniques a month;  service-oriented architecture, running about a dozen production services; GitHub; Amazon’s Virtual Private Cloud; Ansible; mostly Node with some Go; CloudFlare, Fastly, CloudFront with interesting traffic allocations; Nginx and HAProxy; Datadog, PagerDuty, Elasticsearch, Logstash, Kibana; DynamoDB, Redis, Aurora, Neo4J; Protocol Buffers used as contract between layers; and much more.

  • Are notifications the new Web X.0? Notification: the push and the pull: Right now we are witnessing another round of unbundling as the notification screen becomes the primary interface for mobile computing.

  • Algorithm hacking 101. Uber Surge Price? Research Says Walk A Few Blocks, Wait A Few Minutes.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Five Lessons from Ten Years of IT Failures

IEEE Spectrum has a wonderful article series on Lessons From a Decade of IT Failures. It’s not your typical series in that there are very cool interactive graphs and charts based on data collected from past project failures. They are really fun to play with and I can only imagine how much work it took to put them together.

The overall takeaway of the series is:

Even given the limitations of the data, the lessons we draw from them indicate that IT project failures and operational issues are occurring more regularly and with bigger consequences. This isn’t surprising as IT in all its various forms now permeates every aspect of global society. It is easy to forget that Facebook launched in 2004, YouTube in 2005, Apple’s iPhone in 2007, or that there has been three new versions of Microsoft Windows released since 2005. IT systems are definitely getting more complex and larger (in terms of data captured, stored and manipulated), which means not only are they increasing difficult and costly to develop, but they’re also harder to maintain.

Here are the specific lessons:

Click to read more ...


Sponsored Post: Digit, iStreamPlanet, Instrumental, Redis Labs,, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Digit Game Studios, Irish’s largest game development studio, is looking for game server engineers to work on existing and new mobile 3D MMO games. Our most recent project in development is based on an iconic AAA-IP and therefore we expect very high DAU & CCU numbers. If you are passionate about games and if you are experienced in creating low-latency architectures and/or highly scalable but consistent solutions then talk to us and apply here.

  • As a Networking & Systems Software Engineer at iStreamPlanet you’ll be driving the design and implementation of a high-throughput video distribution system. Our cloud-based approach to video streaming requires terabytes of high-definition video routed throughout the world. You will work in a highly-collaborative, agile environment that thrives on success and eats big challenges for lunch. Please apply here.

  • As a Scalable Storage Software Engineer at iStreamPlanet you’ll be driving the design and implementation of numerous storage systems including software services, analytics and video archival. Our cloud-based approach to world-wide video streaming requires performant, scalable, and reliable storage and processing of data. You will work on small, collaborative teams to solve big problems, where you can see the impact of your work on the business. Please apply here.

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Your event could be here. How cool is that?

Cool Products and Services

  • Instrumental is a hosted real-time application monitoring platform. In the words of one of our customers: "Instrumental is the first place we look when an issue occurs. Graphite was always the last place we looked." - Dan M

  • Real-time correlation across your logs, metrics and events. just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...


What ideas in IT must die?

Are there ideas in IT that must die for progress to be made?

Max Planck wryly observed that scientific progress is often less meritocracy and more Lord of the Flies:

A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.

Playing off this insight is a thought provoking book collection of responses to a question posed on the Edge: This Idea Must Die: Scientific Theories That Are Blocking Progress. From the book blurb some of the ideas that should transition into the postmortem are: Jared Diamond explores the diverse ways that new ideas emerge; Nassim Nicholas Taleb takes down the standard deviation; Richard Thaler and novelist Ian McEwan reveal the usefulness of "bad" ideas; Steven Pinker dismantles the working theory of human behavior.

Let’s get edgy: Are there ideas that should die in IT?

What ideas do you think should pass into the great version control system called history? What ideas if garbage collected would allow us to transmigrate into a bright shiny new future? Be as deep and bizarre as you want. This is the time for it.

I have two: Winner Takes All and The Homogeneity Principle.

Winner Takes All

Click to read more ...


Stuff The Internet Says On Scalability For October 23rd, 2015

Hey, it's HighScalability time:

The amazing story of Voyager's walkabout and the three body problem.

If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • $18 billion: wasted on US Army Future Combat system; 70%: Americans who support an Internet sales tax;  $1.3 billion: wasted on an interoperable health record system; trillions: NSA breaking Web and VPN connections; 615: human data teams beat by a computer; $900,000: cost of apps on your smartphone 30 years ago.

  • Quotable Quotes:
    • @PatrickMcFadin: 'Sup 10x coder. Grace Hopper invented the compiler and has a US Navy destroyer named after her. Just how badass are you again?
    • @benwerd: I love Marty McFly too, but more importantly, the first transatlantic voice transmission was sent 100 years ago today. What a century.
    • Martin Goodwell: The nearly two-billion requests that Netflix receives each day result in roughly 20 billion internal API calls.
    • sigma914: It's great to see people implementing distributed services using a vertically scalable technology stack again. The past ~decade has seen a lot of "We can scale sideways so constant overheads are irrelevant! We'll just use Java and add more machines!" which, in real life, seems to leave a lot of performance on the table.
    • Eric Schmidt: The way you build great products is small teams with strong leaders who make tradeoffs and work all night to build a product that just barely works.
    • @boulderDanH: We adopted stateful services early on @VictorOps and I always worried we were crazy. Maybe not
    • @jamesallworth: "The pressure for conformity isn’t limited to car design, it affects *everything*."
    • Eric Schmidt: Hindsight is always that you make the important decisions more quickly.
    • @fromroots: Facebook bought Instagram and WhatsApp to block Chinese competitors like Tencent and Alibaba from scaling globally quickly
    • Eric Schmidt: You’ve got to have products that can scale. What’s new is that once you have that product, you can scale very quickly. Look at Uber.
    • David Ehrenberg: So, before scaling, build your plan, get your systems in place, control your cash burn, create meaningful milestones and plan for cash-flow positive. That’s the foundation to successfully scale.
    • Francis Fukuyama: Hence patrimonialism has evolved into what is called “neopatrimonialism,” in which political leaders adopt the outward forms of modern states—with bureaucracies, legal systems, elections, and the like—and yet in reality rule for private gain. 
    • @sandromancuso: I hate all these bloody Java frameworks. Why devs keep using them? No, you won’t die if you write some code yourself. 
    • Eric Schmidt: Their point was that the industry overvalues experience, and undervalues strategic and tactical flexibility.
    • @AWSUserGroupUK: Daily load fluctuates by two orders of magnitude - auto scaling architecture is essential #BMW #reinvent
    • @tpechacek: “The greatest shortcoming of the human race,” he said, “is our inability to understand the exponential function”
    • @mjpt777: +1000 "Programming with Java concurrency is like working with the inlaws. You never know will happen." - @venkat_s #jokerconf
    • Eric Schmidt: The teams are far larger than they should be. It’s a failure of architecture — the programmers don’t have the right libraries. I hope that machine learning will fix that problem.
    • Marcus Zetterquist: Start using writing your C++, Java and Javascript code using pure functions and immutability NOW. It gets super powers
    • Eric Schmidt: Companies like ours have so much cash that the main limit is opportunities to deploy it.
    • @berkson0: Loving and hating the Scaling keynote at #AWS #reinvent. All my painfully earned infrastructure experience rendered superfluous <sigh>
    • Eric Schmidt: The day we turned on the auctions, revenue tripled.
    • James S.A. Corey: Awareness is a function of the brain just like vision or motor control or language. It isn’t exempt from being broken
    • @mitchellh: Sure, but scaling linearly from millions to trillions of requests won’t scale financially. I’m talking about financial efficiency
    • Anil Ananthaswamy: It turns out that in order to anchor the self to the body, the brain has to integrate signals from within the body with external sensations, and with sensations of position and balance. When something goes wrong with brain regions that integrate all these signals, the results are even more dramatic than out-of-body experiences
    • @alemacgo: “The whole point of science is to penetrate the fog of human senses, including common sense.”
    • @themadstone: Why is life special? Bc a billionth of a billionth of a fraction of all matter in the universe is living matter.

  • Oh how the world has changed. Here's an email from 1996: Alta Vista is a very large project, requiring the cooperation of at least 5 servers, configured for searching huge indices and handling a huge Internet traffic load. The initial hardware configuration for Alta Vista is as follows... 

  • AWS has helped change the VC industry. AWS and Venture Capital. Getting a new company off the ground takes less than a few hundred thousand dollars these days. With AWS all you have are the variable costs of what you use. Gone are the days of needing to buy a bunch of servers and the people to maintain them. Old news. More interesting is because less money is now needed to start a venture more people can help ventures get started. VC incentives are aligned with companies that need to grow quickly, like an Uber. Given that a VC only needs one in ten investments or so to be a huge hit, it's best for a VC if those other nine die as fast as possible to minimize costs. This may not align with your interests if you would like grow more organically. It's unprofitable for VCs to play in the seed funding realm. VCs used to win because they had access to capital and superior information, both of which have been commoditized at today's lower funding levels and higher availability of expertise. So if you want get to heaven, you may need an Angel.

  • Here's the IPv6 carrot. Accessing Facebook can be 10-15 percent faster over IPv6. IPv6: It's time to get on board.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


5 Lessons from 5 Years of Building Instagram

Instagram has always been generous in sharing their accumulated wisdom. Just take a look at the Related Articles section of this post to see how generous.

The tradition continues. Mike Krieger, Instagram co-founder, wrote a really good article on lessons learned from milestones achieved during Five Years of Building Instagram. Here's a summary of the lessons, but the article goes into much more of the connective tissue and is well worth reading.

  1. Do the simple thing first. This is the secret of supporting exponential growth. There's no need to future proof everything you do. That leads to paralysis. For each new challenge find the fastest, simplest fix for each. 
  2. Do fewer things better. Focus on a single platform. This allows you to iterate faster because not everything has to be done twice. When you have to expand create a team explicitly for each platform.
  3. Upfront work but can pay huge dividends. Create an automated scriptable infrastructure implementing a repeatable server provisioning process. This makes it easier to bring on new hires and handle disasters. Hire engineers with the right stuff who aren't afraid to work through a disaster. 
  4. Don’t reinvent the wheel. Instagram moved to Facebook's infrastructure because it allowed them to stay small and leverage a treasure trove of capabilities.
  5. Nothing lasts forever. Be open to evolve your product. Don't be afraid of creating special teams to tackle features and adapt to a rapidly scaling community.

Related Articles


Segment: Rebuilding Our Infrastructure with Docker, ECS, and Terraform

This is a guest repost from Calvin French-Owen, CTO/Co-Founder of Segment

In Segment’s early days, our infrastructure was pretty hacked together. We provisioned instances through the AWS UI, had a graveyard of unused AMIs, and configuration was implemented three different ways.

As the business started taking off, we grew the size of the eng team and the complexity of our architecture. But working with production was still limited to a handful of folks who knew the arcane gotchas. We’d been improving the process incrementally, but we needed to give our infrastructure a deeper overhaul to keep moving quickly.

So a few months ago, we sat down and asked ourselves: “What would an infrastructure setup look like if we designed it today?”

Over the course of 10 weeks, we completely re-worked our infrastructure. We retired nearly every single instance and old config, moved our services to run in Docker containers, and switched over to use fresh AWS accounts.

We spent a lot of time thinking about how we could make a production setup that’s auditable, simple, and easy to use–while still allowing for the flexibility to scale and grow.

Here’s our solution.

Separate AWS Accounts

Click to read more ...


Stuff The Internet Says On Scalability For October 16th, 2015

Hey, it's HighScalability time:

The other world beauty of the world's largest underground Neutrino Detector. Yes, this is a real thing.

If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • 170,000: depression era photos; $465m: amount lost due to a software bug; 368,778: likes in 4 hours as a reaction to Mark Zuckerberg's post on Reactions; 1.8 billion: pictures uploaded every day; 158: # of families generously volunteering to privately fund US elections.

  • Quotable Quotes:
    • @PreetamJinka: I want to run a 2 TB #golang program with 100 vCPUs on an AWS X1 instance.
    • Richard Stallman: The computer industry is the only industry that is more fashion-driven than women's fashion.
    • The evolution of bottlenecks in the Big Data ecosystem: Seeing all these efforts to bypass the garbage collector, we are entitled to wonder why we use a platform whose main asset is to offer a managed memory, if it is to avoid using it?
    • James Hamilton: Services like Lambda that abstract away servers entirely make it even easier to run alternative instruction set architectures.
    • @adrianfcole: Q: Are we losing money? A: Can't answer that, but I can tell you what average CPU usage was 5ish mins ago..
    • h4waii: Because you can't buy trust through an acquisition. You build trust, you don't transfer it through a merger.
    • @mathiasverraes: TIL Ada Lovelace was not only the world's first programmer, she was also the first debugger, fixing a flaw in an algorithm by Babbage.
    • @BenedictEvans: Ways to think about scale: iOS is as big  as BMW, Mercedes, Lexus & Audi combined
    • @caitie: Really enjoyed The Martian, also began thinking about how space is the true test of any distributed system
    • Bits or Pieces?: This is the point, there are two very distinct forms of disruption. It's not all the same despite everyone treating is as such. Alas, people ignore this.
    • Julien CROUZET: So when you have a function or callback that’ll be called repeatedly, try to make it under 600 characters (or your tweaked value), you’ll have a quick win !
    • exelius: They're the walking dead because they pursued scale over innovation. Once they had achieved scale, they found themselves with too much momentum to innovate. So because they couldn't innovate, they built an army of consultants to hawk their wares to customers who also valued scale. 

  • Will Amazon automatically win the IoT space with their recent announcement? Not so fast says Greg Ferro: AWS IoT vs Cisco Fog Computing – Cloud vs Network IoT: AWS is popular with capital poor, low ARPU and fast moving companies in the consumer market. Cisco et al is popular with high net worth conglomerates who build high value, high profit solutions that are slow moving and built on incumbent positions with known and trustable technology partners. There is a market for both types of approaches. One does not “kill” the other, nor it one better or worse, but does limit possible growth and ability to dominate the market.

  • Conway's Law is being used less descriptively these days and more prescriptively. Projects are choosing the organizational structure that creates the software they want to make. From disutopia to utopia. 

  • In a single day Riot chat servers can route a billion events (presences, messages, and IQ stanzas) and process millions of REST queries. Here are lots of lovely details on the League of Legends chat service architecture and how it works. It's based on Erlang  and XMPP, leveraging the OTP framework, concurrency model, and fault-tolerance semantics. For the heaviest string manipulation parts they dropped into C, which save 60% on CPU and lots of per session memory. Chat clusters are independent, but they do share a few tables that are replicated and reside in memory. Riak is used for the database. Also, JENKINS, DOCKER, PROXIES, AND COMPOSE.

  • Brother, can you spare a dime so I can scale my website? That's all it took to handle 60K unique visitors on Amazon's Lambda + S3, less than a dime. No doubt the original architecture could have worked with a few tweaks, but the point here is using JAWS did work. Though a problem I've had with Lambda is the lack of the idea of a session when you have a single page app.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Save some bandwidth by turning off TCP Timestamps

This is a guest post by Donatas Abraitis, System Engineer at Vinted, with an unusual approach for saving a little bandwidth.

Looking at there is a nice title: 'TCP Extensions for High Performance'. It's worth to take a look at date May 1992. Timestamps option may appear in any data or ACK segment, adding 12 bytes to the 20-byte TCP header. 

Using TCP options, the sender places a timestamp in each data segment, and the receiver reflects these timestamps back in ACK segments. Then a single subtract gives the sender an accurate RTT measurement for every ACK segment.

To prove this let's dig into kernel source:

./include/net/tcp.h:#define TCPOLEN_TSTAMP_ALIGNED    12
./net/ipv4/tcp_output.c:static void tcp_connect_init(struct sock *sk)
  tp->tcp_header_len = sizeof(struct tcphdr) +
    (sysctl_tcp_timestamps ? TCPOLEN_TSTAMP_ALIGNED : 0);

Some visualizations:

Click to read more ...


More concurrency: Improved locking in PostgreSQL

If you want to build a large scale website, scaling out the webserver is not enough. It is also necessary to cleverly manage the database side. a key to high scalability is locking.

In PostgreSQL we got a couple of new cool features to reduce locking and to speed up things due to improved concurrency.

General recommendations: Before attacking locking, however, it makes sense to check what is really going on on your PostgreSQL database server. To do so I recommend to take a look at pg_stat_statements and to carefully track down bottlenecks. Here is how it works:

Click to read more ...