Your Load Generator is Probably Lying to You - Take the Red Pill and Find Out Why

Pretty much all your load generation and monitoring tools do not work correctly. Those charts you thought were full of relevant information about how your system is performing are really just telling you a lie. Your sensory inputs are being jammed. 

To find out how listen to the Morpheous of performance monitoring Gil Tene, CTO and co-founder at Azul Systems, makers of truly high performance JVMs, in a mesmerizing talk on How NOT to Measure Latency.

This talk is about removing the wool from your eyes. It's the red pill option for what you thought you were testing with load generators.

Some highlights:

  • If you want to hide the truth from someone show them a chart of all normal traffic with one just one bad spike surging into 95 percentile territory. 

  • The number one indicator you should never get rid of is the maximum value. That’s not noise, it’s the signal, the rest is noise.

  • 99% of users experience ~99.995%’ile response times, so why are you even looking at 95%'ile numbers?

  • Monitoring tools routinely drop important samples in the result set, leading you to draw really bad conclusions about the quality of the performance of your system.

It doesn't take long into the talk to realize Gil really knows his stuff. It's a deep talk with deep thoughts based on deep experience, filled with surprising insights. So if you take the red pill, you'll learn a lot, but you may not always like what you've learned.

Here's my inadequate gloss on Gil's amazing talk:

How to Lie With Percentiles

Click to read more ...


Stuff The Internet Says On Scalability For October 2nd, 2015

Hey, it's HighScalability time:

Elon Musk's presentation of the Tesla Model X had more in common with a new iPhone event than a traditional car demo.

If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.
  • 1.4 billion: Android devices; 1000: # of qubits in Google's new quantum computer; 150Gbps: Linux botnet DDoS attack; 3,000: iPhones sold per minute; smith: the most common last name in the US; 50%: storage reduction by using erasure coding in Hadoop; 101: calories burned during sex.

  • Quotable Quotes:
    • @peterseibel: How to be a 10x engineer: help ten other engineers be twice as good.
    • The Master Algorithm: Scientists make theories, and engineers make devices. Computer scientists make algorithms, which are both theories and devices
    • @immolations: Feudalism may not be perfect but it's the best system we've got. More of us have chainmail today than at any point in history
    • @mjpt777: We managed to transfer almost 10 GB/s worth of 1000 byte messages via Aeron IPC. That's more than a 100GigE network. Way to scale up on box!
    • @caitie: lol what my services do 1.5 billion writes per minute ~25 million writes per second
    • @mjpt777: Think of your QPI links in a multi-socket server as a fast network. Communicate to share memory; don't share memory to communicate.
    • @aalmiray: "you can't have a second CPU until you prove you can use the first one" - @mjpt777
    • Periscope: a hard drive is over 3x faster a than gigabit ethernet
    • thom: Any sufficiently complicated distributed architecture contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of SOAP.
    • @dabeaz: Instead of teaching everyone how to code, I wish we'd just focus on getting everyone's curiosity from kindergarten back.
    • Matthew Jones: It's a Catch-22. We need the metrics to choose the best architecture, but we need to actually implement the damn thing in order to get metrics, and implementation requires us to select an architecture. 
    • @jmwind: Today we built Shopify 500 times, deployed to prod 22 times, peaked at 700 build agents, spun 50k docker containers in test and 25k prod.
    • antirez: Redis, especially using pipelining, can serve an impressive amount of requests per second per thread (half a million is a common figure with very intensive pipelining. Without pipelining it is around 100,000 ops/sec). 
    • @jcox92: This is my invitation to you to start using languages that were discovered rather than languages that were invented." #strangeloop
    • @tyler_treat: "Measuring latency at saturation is like looking at your bumper after wrapping your car around a pole." —@giltene
    • There are a lot of great quotes this week. So to see all of the Quotable Quotes please see the full article.

  • Another example of the diffusion of the software ethos. Elon Musk's presentation of the Tesla Model X had more in common with a new iPhone event than a traditional car demo. First, it was a livecast that started a touch late. Second, throngs of fanpeople clapped and whooped in all the appropriate places. Gone are the beauty shots of cars simply meant to stroke the lizard brain. Elon hit the use cases. He talked vision statement. He talked safety specs and features. He talked air quality in depth. He didn't wait for iFixit to do a tear down, he showed construction details and how they reinforced features and quality. He showed how the Falcon Wing door auto opened and closed; how the doors worked in a crowded parking lot; and how the door design also allowed passengers to easily access the third row of seats. This focus on the car as an engineered product for solving tangible problems in real life may be the lasting legacy of Tesla. 

  • Tools are to programmers like shoes are to the mundane fashion world. Which is what makes this discussion of Why Fogbugz lost to Jira in the bug tool wars so fascinating. In one corner we have gecko with a nice analysis of the FogBugz side and we have carlfish with a quality response from the Atlassian perspective. It's painful to remember how convoluted product deployment was before software as a service. 

  • How does the CIA provide advanced state-of-the-art analytics? On Amazon of course. Amazon birthed the CIA their own region in 9 months. The CIA decided the only way to reach commercial parity was to to stop trying to do it themselves and leverage those who already know how to do it. The CIA will have its own private version of the marketplace so they can transition tools as fast as possible into the hands of analysts. The CIA really likes themselves some Spark. Partnering for expertise is something the CIA is trying to learn how to do. Oh, the CIA is hiring. 

  • Jeff Atwood has the sense of this. Learning to code is overrated: An accomplished programmer would rather his kids learn to read and reason. One caveat is understanding algorithms will be a necessary life skill now and certainly in the future. We'll need to see algorithms for what they are, biased tools that serve someone else's purpose. It's common even among the learned today to see algorithms as objective and benign. The easiest way of piercing the algorithm washing vale may be for people to learn a little programming. That may help demystify what's really going on.

  • Embrace, extend and extinguish. Amazon Will Ban Sale of Apple, Google Video-Streaming Devices. This kind of cross division strategy tax often marks the beginning of the end. Amazon is no longer an everything store. Once we begin to not think of going to Amazon First when shopping then we may transition to Amazon Maybe and then to Amazon Never. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Strategy: Taming Linux Scheduler Jitter Using CPU Isolation and Thread Affinity

When nanoseconds matter you have to pay attention to OS scheduling details. Mark Price, who works in the rarified high performance environment of high finance, shows how in his excellent article on Reducing system jitter.

For a tuning example he uses the famous Disrupter inter-thread messaging library. The goal is to keep the OS continuously feeding CPUs work from high priority threads. His baseline test shows the fastest message is sent in 76 nanoseconds, 1 in 100 messages took longer than 2 milliseconds, and the longest delay was 11 milliseconds.

The next section of the article shows in loving detail how to bring those latencies lower and more consistent, a job many people will need to do in practice. You'll want to read the article for a full explanation, including how to use perf_events and HdrHistogram. It's really great at showing the process, but in short:

  • Turning off power save mode on the CPU reduced brought the max latency from 11 msec down to 8 msec.
  • Guaranteeing threads will always have CPU resources using CPU isolation and thread affinity brought the maximum latency down to 14 microseconds.

Related Articles


Sponsored Post: iStreamPlanet,, Instrumental, Location Labs, Enova, Surge, Redis Labs,, VoltDB, Datadog, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • As a Networking & Systems Software Engineer at iStreamPlanet you’ll be driving the design and implementation of a high-throughput video distribution system. Our cloud-based approach to video streaming requires terabytes of high-definition video routed throughout the world. You will work in a highly-collaborative, agile environment that thrives on success and eats big challenges for lunch. Please apply here.

  • As a Scalable Storage Software Engineer at iStreamPlanet you’ll be driving the design and implementation of numerous storage systems including software services, analytics and video archival. Our cloud-based approach to world-wide video streaming requires performant, scalable, and reliable storage and processing of data. You will work on small, collaborative teams to solve big problems, where you can see the impact of your work on the business. Please apply here.

  • is a *profitable* fast-growing SaaS startup looking for a Lead DevOps/Infrastructure engineer to join our ~10 person team in Palo Alto or *remotely*. Come help us improve API performance, tune our databases, tighten up security, setup autoscaling, make deployments faster and safer, scale our MongoDB/Elasticsearch/MySQL/Redis data stores, setup centralized logging, instrument our app with metric collection, set up better monitoring, etc. Learn more and apply here.

  • Location Labs is the global pioneer in mobile security for humans. Our services are used by millions of monthly paying subscribers worldwide. We were named one of Entrepreneur magazine’s “most brilliant” companies and TechCrunch said we’ve “cracked the code” for mobile monetization. If you are someone who enjoys the scrappy, get your hands dirty atmosphere of a startup, but has the measured patience and practices to keep things robust, well documented, and repeatable, Location Labs is the place for you. Please apply here.

  • As a Lead Software Engineer at Enova you’ll be one of Enova’s heavy hitters, overseeing technical components of major projects. We’re going to ask you to build a bridge, and you’ll get it built, no matter what. You’ll balance technical requirements with business needs, while advocating for a high quality codebase when working with full business teams. You’re fluent in ‘technical’ language and ‘business’ language, because you’re the engineer everyone counts on to understand how it works now, how it should work, and how it will work. Please apply here.

  • As a UI Architect at Enova, you will be the elite representative of our UI culture. You will be responsible for setting a vision, guiding direction and upholding high standards within our culture. You will collaborate closely with a group of talented UI Engineers, UX Designers, Visual Designers, Marketing Associates and other key business stakeholders to establish and maintain frontend development standards across the company. Please apply here.

  • VoltDB's in-memory SQL database combines streaming analytics with transaction processing in a single, horizontal scale-out platform. Customers use VoltDB to build applications that process streaming data the instant it arrives to make immediate, per-event, context-aware decisions. If you want to join our ground-breaking engineering team and make a real impact, apply here.  

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Surge 2015. Want to mingle with some of the leading practitioners in the scalability, performance, and web operations space? Looking for a conference that isn't just about pitching you highly polished success stories, but that actually puts an emphasis on learning from real world experiences, including failures? Surge is the conference for you.

  • Your event could be here. How cool is that?

Cool Products and Services

  • Instrumental is a hosted real-time application monitoring platform. In the words of one of our customers: "Instrumental is the first place we look when an issue occurs. Graphite was always the last place we looked."

  • Real-time correlation across your logs, metrics and events. just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...


How Facebook Tells Your Friends You're Safe in a Disaster in Under Five Minutes

In a disaster there’s a raw and immediate need to know your loved ones are safe. I felt this way during 9/11. I know I’ll feel this way during the next wild fire in our area. And I vividly remember feeling this way during the 1989 Loma Prieta earthquake.

Most earthquakes pass beneath notice. Not this one and everyone knew it. After ceiling tiles stopped falling like snowflakes in the computer lab, we convinced ourselves the building would not collapse, and all thoughts turned to the safety of loved ones. As it must have for everyone else. Making an outgoing call was nearly impossible, all the phone lines were busy as calls poured into the Bay Area from all over the nation. Information was stuck. Many tense hours were spent in ignorance as the TV showed a constant stream of death and destruction.

It’s over a quarter of a century later, can we do any better?

Facebook can. Through a product called Safety Check, which connects friends and loved ones during a disaster. When a disaster hits Safety Check prompts people in the area to indicate if they are OK or not. Then Facebook closes the worry loop by telling their friends how they are doing.

Brian Sa, Engineer Manager at Facebook, created Safety Check out of his experience of the devastating earthquake in Fukushima Japan in 2011. He told his very moving story in a talk he gave at @Scale.

During the earthquake Brian put a banner on Facebook with helpful information sources, but he was moved to find a better way to help people in need. That impulse became Safety Check.

My first reaction to Safety Check was damn, why didn’t anyone think of this before? It’s such a powerful idea.

The answer became clear as I listened to a talk in the same video given by Peter Cottle, Software Engineer at Facebook, who also talked about building Safety Check.

It’s likely only Facebook could have created Safety Check. This observation dovetails nicely with Brian’s main lesson in his talk:

  • Solve real-world problem in a way that only YOU can. Instead of taking the conventional route, think about the unique role you and your company can play.

Only Facebook could create Safety Check, not because of resources as you might expect, but because Facebooks lets employees build crazy things like Safety Check and because only Facebook has 1.5 billion geographically distributed users, with a degree of separation between them of only 4.74 edges, and only Facebook has users who are fanatical about reading their news feeds. More about this later.

In fact, Peter talked about how resources were a problem in a sort of product development Catch-22 at Facebook. The team for Safety Check was small and didn’t have a lot of resources attached to it. They had to build the product and prove its success without resources before they could get the resources to build the product. The problem had to be efficiently solved at scale without the application of lots of money and lots of resources.

As is often the case constraints led to a clever solution. A small team couldn’t build a big pipeline and index, so they wrote some hacky PHP and effectively got the job done at scale.

So how did Facebook build Safety Check? Here’s my gloss on both Brian’s and Peter’s talks:

Click to read more ...


Stuff The Internet Says On Scalability For September 25th, 2015

Hey, it's HighScalability time:

 How long would you have lasted? Loved The Martian. Can't wait for the game, movie, and little potato action figures. Me, I would have died on the first level.

  • 60 miles: new record distance for quantum teleportation; 160: size of minimum viable Mars colony; $3 trillion: assets managed by hedge funds; 5.6 million: fingerprints stolen in cyber attack; 400 million: Instagram monthly active users; 27%: increase in conversion rate from mobile pages that are 1 second faster; 12BN: daily Telegram messages; 1800 B.C: oldest beer recipe; 800: meetings booked per day at Facebook; 65: # of neurons it takes to walk with 6 legs

  • Quotable Quotes:
    • @bigdata: assembling billions of pieces of evidence: Not even the people who write algorithms really know how they work
    • @zarawesome: "This is the most baller power move a billionaire will pull in this country until Richard Branson finally explodes the moon."
    • @mtnygard: An individual microservice fits in your head, but the interrelationships among them exceeds any human's ability. Automate your awareness.
    • Ben Thompson~ The mistake that lots of BuzzFeed imitators have made is to imitate the BuzzFeed article format when actually what should be imitated from BuzzFeed is the business model. The business model is creating portable content that will live and thrive on all kinds of different platforms. The BuzzFeed article is relatively unsophisticated, it's mostly images and text, and mostly images.
    • For more Quotable Quotes please see the full article.

  • Is what Volkswagen did really any different that what happens on benchmarks all the time? Cheating and benchmarks go together like a clear conscience and rationalization. Clever subterfuge is part of the software ethos. There are many many examples. Cars are now software is a slick meme, but that transformation has deep implications. The software culture and the manufacturing culture are radically different.

  • Can we ever trust the fairness of algorithms? Of course not. Humans in relation to their algorithms are now in the position of priests trying to divine the will of god. Computer Scientists Find Bias in Algorithms: Many people believe that an algorithm is just a code, but that view is no longer valid, says Venkatasubramanian. “An algorithm has experiences, just as a person comes into life and has experiences.”

  • Stuff happens, even to the best. But maybe having a significant percentage of the world's services on the same platform is not wise or sustainable. Summary of the Amazon DynamoDB Service Disruption and Related Impacts in the US-East Region.

  • According to patent drawings what does the Internet look like? Noah Veltman has put together a fun list of examples: it's a cloud, or a bean, or a web, or an explosion, or a highway, or maybe a weird lump.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


How will new memory technologies impact in-memory databases?

This is a guest post by Yiftach Shoolman, Co-founder & CTO of redislabs. Will 3D XPoint change everything? Not as much as you might hope...

Recently, investors, analysts, partners and customers have asked me how the announcement from Intel and Micron about their new 3D XPoint memory technology will affect the in-memory databases market. In these discussions, a common question was “Who needs an in-memory database if all the non in-memory databases will achieve similar performance with 3D XPoint technology?” Well, I think that's a valid question so I've decided to take a moment to describe how we think this technology will influence our market.

First, a little background...

The motivation of Intel and Micron is clear -- DRAM is expensive and hasn’t changed much during the last few years (as shown below). In addition, there are currently only three major makers of DRAM on the planet (Samsung Electronics, Micron and SK Hynix), which means that the competition between them is not as cutthroat as it used to be between four and five major manufacturers several years ago.

DRAM Price Trends

Click to read more ...


Uber Goes Unconventional: Using Driver Phones as a Backup Datacenter

In How Uber Scales Their Real-Time Market Platform one of the most intriguing hints was how Uber handles datacenter failovers using driver phones as an external distributed storage system for recovery.

Now we know a lot more about how that system works from Uber's Nikunj Aggarwal and Joshua Corbin, who gave a very interesting talk at the @Scale conference: How Uber Uses your Phone as a Backup Datacenter.

Rather than use a traditional backend replication scheme where databases sync state between datacenters to achieve a measure of k-safety, Uber did something different, what they do is store enough state on driver phones so that if a datacenter failover occurs trip information can not be lost on the failover.

Why choose this approach? The traditional approach would be much simpler. I think it is to make sure the customer always has a good customer experience and losing trip information for an active trip would make for a horrible customer experience. 

By building their syncing strategy around the phone, even thought it's complicated and takes a lot work, Uber is able to preserve trip data and make for a seamless customer experience even on datacenter failures. And making the customer happy is what counts, especially in a market with near zero switching costs.

So the goal is not to lose trip information, even on a datacenter failover. Using a traditional database replication strategy it would not be possible to make this guarantee for reasons that have parallels to how network management systems have always had to work. Let me explain.

In a network devices are the authoritative source for state information like packet errors, alarms, packets sent and received, and so on. The network management system is authoritative for configuration data like alarm thresholds and customer information. The complication is devices and the network management system are not always in contact, so they get out of sync because they work independently of each other. Which means on bootup, failover, and communication reconnection all this information has to be merged in both directions using a complicated dance that ensures correctness and consistency. 

Uber has the same problem, only the devices are smart phones and the authoritative state the phone contains is trip information. So on bootup, failover, and communication reconnection the trip information must be preserved because the phone is the authoritative source for trip information.

Even when connectivity is lost the phone has an accurate record all trip data. So you wouldn't want to sync trip data from the datacenter down to the phone because that would wipe out the correct data on the phone. The correct information must come from the phone.

Uber also takes another trick from network management systems. They periodically query phones to test the integrity of information in the datacenter. 

Let's see how they do it...

Motivation for Using Phones as Storage for Datacenter Failure

Click to read more ...


Stuff The Internet Says On Scalability For September 18th, 2015

Hey, it's HighScalability time:

This is how you blast microprocessors with high-energy beams to test them for space.

  • terabits: Facebook's network capacity; 56.2 Gbps: largest extortion DDoS attack seen by Akamai; 220: minutes spent usings apps per day; $33 billion: 2015 in-app purchases; 2334: web servers running in containers on a Raspberry Pi 2; 121: startups valued over $1 billion

  • Quotable Quotes:
    • A Beautiful Question: Finding Nature's Deep Design: Two obsessions are the hallmarks of Nature’s artistic style: Symmetry—a love of harmony, balance, and proportion Economy—satisfaction in producing an abundance of effects from very limited means
    • : ad blocking Apple has done to Google what Google did to MSFT. Added a feature they can't compete with without breaking their biz model
    • @shellen: FWIW - Dreamforce is a localized weather system that strikes downtown SF every year causing widespread panic & bad slacks. 
    • @KentBeck: first you learn the value of abstraction, then you learn the cost of abstraction, then you're ready to engineer
    • @doctorow: Arab-looking man of Syrian descent found in garage building what looks like a bomb 
    • @kixxauth: Idempotency is not something you take a pill for. -- ZeroMQ
    • @sorenmacbeth: Alice in Blockchains
    • Sebastian Thrun: BECAUSE of the increased efficiency of machines, it is getting harder and harder for a human to make a productive contribution to society
    • Coding Horror: Getting the details right is the difference between something that delights, and something customers tolerate.
    • @mamund: "[92% of] all catastrophic failures are the result of incorrect handling of non-fatal errors."
    • Charles Weitz: Almost every cell in our body has a circadian clock. It helps every cell figure out when to use energy, when to rest, when to repair DNA, or to replicate DNA.
    • @kfury: Web development skills are like cells in your body. Every 7 years they're completely replaced by new ones.
    • Alexey Gorshkov: We’re learning how to build complex states of light that, in turn, can be built into more complex objects. 
    • @BenedictEvans: Ad blocking = taking money away from people whose work you read. Everyone has reasons, or excuses. But it remains true
    • Gaffer on Games: I swear you guys are like the f*cking climate change deniers of network programming..not just a rant, also deeply informative.
    • @anoemi: I don't use emojis because when I use smiley faces, I like to stay close to the metal.
    • @neil_conway: in practice, basically no app logic gets retry logic right (esp. for read-only xacts, which can abort under serializable).
    • @xaprb: All roads lead to Rome. All queueing theory studies lead to Agner Erlang. All scalability studies lead to Neil Gunther.

  • Why doesn't Google use git? Here's why. Stats on the Google source code repository: 1 billion files, 9 million source files, 2 billion lines of code, 35 million commits, 86 terabytes, 45 thousand commits per workday, 25,000 Googlers from all over the world, billions of file read requests per day (800K QPS peak). All in one single repository. The rate of change is on an exponential growth curve. Of note: robots commit 30K times per day, humans only 15K. From a talk by Rachel Potvin: The Motivation for a Monolithic Codebase

  • The problem is as soon as Medium becomes everything it also becomes nothing. Medium's Evan Williams To Publishers: Your Website Is Toast

  • If you appreciate the technical aspects of the intricate bot games Ashley Madison is said to have played then you might enjoy Darknet, a book that takes the same idea to chilling extremes. AI driven Distributed Autonomous Corporations use bitcoin and anonymous markets to take the world to the brink. Only a gambit worthy of Captain Kirk saves the day.

  • Points to ponder. Why I wouldn’t use rails for a new company: I worry now that rails is past its zenith, and that starting a new company with rails today might be like starting a company using Java Spring in 2007...Everyone knows that ruby is slow...over time other frameworks simply picked up those innovations [Rails]...If you want to future-proof your web application, you have to make a bet on what engineers will want to use in three years. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


5 Lessons and 8 Industry Changes Over 5 Years as Etsy CTO

Endings are often a time for reflection and from reflection often comes wisdom. That is the case for Kellan Elliott-McCrea, who recently announced he was leaving his job after five successful years as the CTO of Etsy. Kellan wrote a rather remarkable going away post: Five years, building a culture, and handing it off, brimming with both insight and thoughtful commentary.

This post is just a short gloss of the major points. He goes into more depth on each point, so please read his post.

The Five Lessons:

  1. Nothing we “know” about software development should be assumed to be true.
  2. Technology is the product of the culture that builds it.
  3. Software development should be thought of as a cycle of continual learning and improvement rather a progression from start to finish, or a search for correctness.
  4. You build a culture of learning by optimizing globally not locally.
  5. If you want to build for the long term, the only guarantee is change.

The Eight Industry Changes

  1. Five years ago, continuous deployment was still a heretical idea. 
  2. Five years ago, it was crazy to discuss that monitoring, testing, debugging, QA, staged releases, game days, user research, and prototypes are all tools with the same goal, improving confidence, rather than separate disciplines handled by distinct teams.
  3. Five years ago, focusing on detection and response vs prevention in order to achieve better, more reliable, more scalable, and more secure software was unprofessional.
  4. Five years ago, suggesting that better software is written by a diverse team of kind people who care about each other was antithetical to our self-image as an industry.
  5. Five years ago, trusting not only our designers and product managers to code and deploy to production, but trusting everyone in the company to deploy to production.
  6. Five years ago, rooms of people excitedly talking about their own contribution to a serious outage would have been a prelude to mass firings, rather than a path to profound learning.
  7. And five years ago no one was experimenting in public about how to do this stuff, sharing their findings, and open sourcing code to support this way of working.
  8. Five years ago, it would have seemed ludicrous to think a small team supporting a small site selling crafts could aspire to change how software is built and, in the process, cause us to rethink how the economy works.

While many of these ideas were happening more than five years ago the point still stands, the industry has undergone a lot of changes recently, and sometimes it's worth taking a little time to reflect on that a bit.