Stuff The Internet Says On Scalability For September 18th, 2015

Hey, it's HighScalability time:

This is how you blast microprocessors with high-energy beams to test them for space.

  • terabits: Facebook's network capacity; 56.2 Gbps: largest extortion DDoS attack seen by Akamai; 220: minutes spent usings apps per day; $33 billion: 2015 in-app purchases; 2334: web servers running in containers on a Raspberry Pi 2; 121: startups valued over $1 billion

  • Quotable Quotes:
    • A Beautiful Question: Finding Nature's Deep Design: Two obsessions are the hallmarks of Nature’s artistic style: Symmetry—a love of harmony, balance, and proportion Economy—satisfaction in producing an abundance of effects from very limited means
    • : ad blocking Apple has done to Google what Google did to MSFT. Added a feature they can't compete with without breaking their biz model
    • @shellen: FWIW - Dreamforce is a localized weather system that strikes downtown SF every year causing widespread panic & bad slacks. 
    • @KentBeck: first you learn the value of abstraction, then you learn the cost of abstraction, then you're ready to engineer
    • @doctorow: Arab-looking man of Syrian descent found in garage building what looks like a bomb 
    • @kixxauth: Idempotency is not something you take a pill for. -- ZeroMQ
    • @sorenmacbeth: Alice in Blockchains
    • Sebastian Thrun: BECAUSE of the increased efficiency of machines, it is getting harder and harder for a human to make a productive contribution to society
    • Coding Horror: Getting the details right is the difference between something that delights, and something customers tolerate.
    • @mamund: "[92% of] all catastrophic failures are the result of incorrect handling of non-fatal errors."
    • Charles Weitz: Almost every cell in our body has a circadian clock. It helps every cell figure out when to use energy, when to rest, when to repair DNA, or to replicate DNA.
    • @kfury: Web development skills are like cells in your body. Every 7 years they're completely replaced by new ones.
    • Alexey Gorshkov: We’re learning how to build complex states of light that, in turn, can be built into more complex objects. 
    • @BenedictEvans: Ad blocking = taking money away from people whose work you read. Everyone has reasons, or excuses. But it remains true
    • Gaffer on Games: I swear you guys are like the f*cking climate change deniers of network programming..not just a rant, also deeply informative.
    • @anoemi: I don't use emojis because when I use smiley faces, I like to stay close to the metal.
    • @neil_conway: in practice, basically no app logic gets retry logic right (esp. for read-only xacts, which can abort under serializable).
    • @xaprb: All roads lead to Rome. All queueing theory studies lead to Agner Erlang. All scalability studies lead to Neil Gunther.

  • Why doesn't Google use git? Here's why. Stats on the Google source code repository: 1 billion files, 9 million source files, 2 billion lines of code, 35 million commits, 86 terabytes, 45 thousand commits per workday, 25,000 Googlers from all over the world, billions of file read requests per day (800K QPS peak). All in one single repository. The rate of change is on an exponential growth curve. Of note: robots commit 30K times per day, humans only 15K. From a talk by Rachel Potvin: The Motivation for a Monolithic Codebase

  • The problem is as soon as Medium becomes everything it also becomes nothing. Medium's Evan Williams To Publishers: Your Website Is Toast

  • If you appreciate the technical aspects of the intricate bot games Ashley Madison is said to have played then you might enjoy Darknet, a book that takes the same idea to chilling extremes. AI driven Distributed Autonomous Corporations use bitcoin and anonymous markets to take the world to the brink. Only a gambit worthy of Captain Kirk saves the day.

  • Points to ponder. Why I wouldn’t use rails for a new company: I worry now that rails is past its zenith, and that starting a new company with rails today might be like starting a company using Java Spring in 2007...Everyone knows that ruby is slow...over time other frameworks simply picked up those innovations [Rails]...If you want to future-proof your web application, you have to make a bet on what engineers will want to use in three years. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


5 Lessons and 8 Industry Changes Over 5 Years as Etsy CTO

Endings are often a time for reflection and from reflection often comes wisdom. That is the case for Kellan Elliott-McCrea, who recently announced he was leaving his job after five successful years as the CTO of Etsy. Kellan wrote a rather remarkable going away post: Five years, building a culture, and handing it off, brimming with both insight and thoughtful commentary.

This post is just a short gloss of the major points. He goes into more depth on each point, so please read his post.

The Five Lessons:

  1. Nothing we “know” about software development should be assumed to be true.
  2. Technology is the product of the culture that builds it.
  3. Software development should be thought of as a cycle of continual learning and improvement rather a progression from start to finish, or a search for correctness.
  4. You build a culture of learning by optimizing globally not locally.
  5. If you want to build for the long term, the only guarantee is change.

The Eight Industry Changes

  1. Five years ago, continuous deployment was still a heretical idea. 
  2. Five years ago, it was crazy to discuss that monitoring, testing, debugging, QA, staged releases, game days, user research, and prototypes are all tools with the same goal, improving confidence, rather than separate disciplines handled by distinct teams.
  3. Five years ago, focusing on detection and response vs prevention in order to achieve better, more reliable, more scalable, and more secure software was unprofessional.
  4. Five years ago, suggesting that better software is written by a diverse team of kind people who care about each other was antithetical to our self-image as an industry.
  5. Five years ago, trusting not only our designers and product managers to code and deploy to production, but trusting everyone in the company to deploy to production.
  6. Five years ago, rooms of people excitedly talking about their own contribution to a serious outage would have been a prelude to mass firings, rather than a path to profound learning.
  7. And five years ago no one was experimenting in public about how to do this stuff, sharing their findings, and open sourcing code to support this way of working.
  8. Five years ago, it would have seemed ludicrous to think a small team supporting a small site selling crafts could aspire to change how software is built and, in the process, cause us to rethink how the economy works.

While many of these ideas were happening more than five years ago the point still stands, the industry has undergone a lot of changes recently, and sometimes it's worth taking a little time to reflect on that a bit. 


Sponsored Post: Microsoft, Instrumental, Location Labs, Enova, Librato, Surge, Redis Labs,, VoltDB, Datadog, SignalFx, InMemory.Net, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Microsoft’s Visual Studio Online team is building the next generation of software development tools in the cloud out in Durham, North Carolina. Come help us build innovative workflows around Git and continuous deployment, help solve the Git scale problem or help us build a best-in-class web experience. Learn more and apply.

  • Location Labs is the global pioneer in mobile security for humans. Our services are used by millions of monthly paying subscribers worldwide. We were named one of Entrepreneur magazine’s “most brilliant” companies and TechCrunch said we’ve “cracked the code” for mobile monetization. If you are someone who enjoys the scrappy, get your hands dirty atmosphere of a startup, but has the measured patience and practices to keep things robust, well documented, and repeatable, Location Labs is the place for you. Please apply here.

  • As a Lead Software Engineer at Enova you’ll be one of Enova’s heavy hitters, overseeing technical components of major projects. We’re going to ask you to build a bridge, and you’ll get it built, no matter what. You’ll balance technical requirements with business needs, while advocating for a high quality codebase when working with full business teams. You’re fluent in ‘technical’ language and ‘business’ language, because you’re the engineer everyone counts on to understand how it works now, how it should work, and how it will work. Please apply here.

  • As a UI Architect at Enova, you will be the elite representative of our UI culture. You will be responsible for setting a vision, guiding direction and upholding high standards within our culture. You will collaborate closely with a group of talented UI Engineers, UX Designers, Visual Designers, Marketing Associates and other key business stakeholders to establish and maintain frontend development standards across the company. Please apply here.

  • VoltDB's in-memory SQL database combines streaming analytics with transaction processing in a single, horizontal scale-out platform. Customers use VoltDB to build applications that process streaming data the instant it arrives to make immediate, per-event, context-aware decisions. If you want to join our ground-breaking engineering team and make a real impact, apply here.  

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Surge 2015. Want to mingle with some of the leading practitioners in the scalability, performance, and web operations space? Looking for a conference that isn't just about pitching you highly polished success stories, but that actually puts an emphasis on learning from real world experiences, including failures? Surge is the conference for you.

  • Your event could be here. How cool is that?

Cool Products and Services

  • Instrumental is a hosted real-time application monitoring platform. In the words of one of our customers: "Instrumental is the first place we look when an issue occurs. Graphite was always the last place we looked."

  • Librato, a SolarWinds Cloud company, is a hosted monitoring platform for real-time operations and performance analytics. Easily add metrics from any source using turnkey solutions such as the AWS Cloudwatch integration, or by leveraging any of over 100 open source collection agents and language bindings. Librato is loved equally by DevOps and data engineers. Start using Librato today. Full-featured and free for 30 days.

  • Real-time correlation across your logs, metrics and events. just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Sumo Logic alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...


How Uber Scales Their Real-time Market Platform

Reportedly Uber has grown an astonishing 38 times bigger in just four years. Now, for what I think is the first time, Matt Ranney, Chief Systems Architect at Uber, in a very interesting and detailed talk--Scaling Uber's Real-time Market Platform---tells us a lot about how Uber’s software works.

If you are interested in Surge pricing, that’s not covered in the talk. We do learn about Uber’s dispatch system, how they implement geospatial indexing, how they scale their system, how they implement high availability, and how they handle failure, including the surprising way they handle datacenter failures using driver phones as an external distributed storage system for recovery.

The overall impression of the talk is one of very rapid growth. Many of the architectural choices they’ve made are a consequence of growing so fast and trying to empower recently assembled teams to move as quickly as possible. A lot of technology has been used on the backend because their major goal has been for teams to get the engineering velocity as high as possible.

After a understandably chaotic (and very successful) start it seems Uber has learned a lot about their business and what they really need to succeed. Their early dispatch system was a typical just make it work type affair that assumed at a deep level it was moving only people. Now that Uber’s mission has grown to handle boxes and groceries as well as people, their dispatch system has been abstracted and put on very solid and smart architectural foundation.

Though Matt thinks their architecture might be a little crazy, the idea of using a consistent hash ring with a gossip protocol seems spot on for their use case.

It’s hard not to be captivated for Matt’s genuine enthusiasm for what he’s working on. When talking about DISCO, their dispatch system, he says in an excited tone that it’s like the traveling salesman problem from school. It’s a cool Computer Science thing. Even though the solution isn’t optimal, it’s the traveling salesman at an interesting scale, in real-time, in the real-world, built out of fault tolerant scalable components. How cool is that?

So let’s see how Uber works on the inside. Here’s my gloss on Matt’s talk:


Click to read more ...


Stuff The Internet Says On Scalability For September 11th, 2015

Hey, it's HighScalability time:

Need a challenge? Solve the code on this 17.5 feet tall 11,000 year old wooden statue!

  • $100 million: amount Popcorn could have made from criminal business offers; 3.2-gigapixel: World’s Most Powerful Digital Camera; $17.3 trillion: US GDP in 2014;  700 million: Facebook time series database data points added per minute; 300PB: Facebook data stored in Hive; 5,000: Airbnb EC2 instances.

  • Quotable Quotes:
    • @jimmydivvy: NASA: Decade long flight across the solar system. Arrives within 72 seconds of predicted. No errors. Me: undefined is not a function
    • Packet Pushers~ Everyone has IOPS now. We are heading towards invisible consumption being the big deal going forward. 
    • Randy Medlin: Gonna drop $1000+ on a giant iPad, $100 on a stylus, then whine endlessly about $4.99 drawing apps.
    • Anonymous: Circuit Breaker + Real-time Monitoring + Recovery = Resiliency
    • Astrid Atkinson: I used to get paged awake at two in the morning. You go from zero to Google is down. That’s a lot to wake up to.
    • Todd Waters~ In 1979, 200MB weighed 30 lbs and took up the space of a washing machine
    • Todd Waters~ CERN spends more compute power throwing away data than storing and analyzing it
    • Rob Story:  We've clearly reached the point where SSD/RAM bandwidth have completely outpaced CPU compute.
    • Shedding light on the era of 'dark silicon': We will soon live in an era where perhaps more than 80 per cent of computer processors' transistors must be powered off  and 'remain dark' at any time to prevent the chip from overheating.
    • @diogomonica: In a container world, when someone asks about A vs B, the answer is always, A on top of B. #softwarecircus
    • Mike Curtis (Airbnb)~ 70 percent of the people who put space up for rent on Airbnb in New York City say they do so because if they didn’t, they would lose their apartments or homes
    • Mike Curtis (Airbnb)~ it would probably be on the order of 20 percent to 30 percent more expensive to operate its own datacenters than rent capacity on AWS 
    • @cloud_opinion: If John McAfee gets elected as President once, it will be impossible to uninstall him.
    • @bradurani: The greatest trick the ORM ever pulled was convincing the world the DB doesn't exist... and it's a disaster for a generation of devs
    • @coderoshi: The idea that management is the higher rung of a programmer's career ladder is like thinking that every actor wants to become a director.
    • @HiddenBrain: MT @CBinsights: A million guys walk into a Silicon Valley bar. No one buys anything. Bar is declared a massive success.
    • @Carnage4Life: Every time a developer says "temporary workaround" I remember this list. 

  • Some impressive gains by migrating from Python to Go. From Python to Go: migrating our entire API: reducing the mean response time of an API call from 100ms to 10ms...We reduced the number of EC2 instances required by 85%...we can now ship a self-hosted version of Repustate that is identical to the one we host for customers.

  • Rob Story has an awesome summary of the goings on at the Very Large DataBases Conference. His main gloss is at VLDB 2015: Concurrency, DataFlow, E-Store, but he also has a day by day summaries up on github. An amazing job and lots of concentrated insight.

  • Really wonderful article. A Life in Games: The Playful Genius of John Conway. Packed with slices of life that make me feel like I would like a little more John Conway in my life.

  • Need high availability? Here's how eBay uses Netflix Hystrix to implement the Circuit Breaker pattern. An example is given for their Secure Token service. Hystrix: a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

  • Rob Pike and Naitik Shah are speaking at the Fall Gopherfest - Silicon Valley on November 18th. It's free and you might find it useful. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


Trade Stimulators and the Very Old Idea of Increasing User Engagement 

Very early in my web career I was introduced to the almost mystical holy grail of web (and now app) properties: increasing user engagement.

The reason is simple. The more time people spend with your property the more stuff you can sell them. The more stuff you can sell the more value you have. Your time is money. So we design for addiction.

Famously Facebook, through the ties that bind, is the engagement leader with U.S. adults spending a stunning average of 42.1 minutes per day on Facebook. Cha-ching.

Immense resources are spent trying to make websites and apps sticky. Psychological tricks and gamification strategies are deployed with abandon to get you not to leave a website or to keep playing an app.

It turns out this is a very old idea. Casinos are designed to keep you gambling, for example. And though I’d never really thought about it before, I shouldn’t have been surprised to learn retail stores of yore used devices called trade stimulators to keep customers hanging around and spending money.

Never heard of trade stimulators? I hadn’t either until, while watching American Pickers, one of my favorite shows, they talked about this whole category of things people collect that I had no idea even existed!

Here’s an explanation of trade stimulators on the For Amusement Only EM and Bingo Pinball podcast. They are small gambling devices used in stores and bars. Usually it was a mini slot machine or game of chance, like a horse racing game or a dice game. It would vend you a small trinket like a particular color gum ball that could be turned into the shop keeper for a free drink or other prize. The idea is you put money in and you keep spending money at the establishment. 

Here’s a beautiful Sun Mfg 2 Wheel Bicycle Trade Stimulator from the late 1800s. The wheels spin and when the wheels stop spinning you add up the numbers by the indicators to learn what prize you've won. It could be a cigar or a drink, for example.


Sun Mfg 2 Wheel Bicycle Trade Stimulator.jpg


Here’s a Stephens Magic Beer Barrel Trade Stimulator from around 1934. Your prize is a beer. Even if you didn’t win a beer you would get a pretzel, which would of course make you thirsty so you want more beer!

Click to read more ...


Want IoT? Here's How a Major US Utility Collects Power Data from Over 5.5 Million Meters

I serendipitously found this fascinating reply by Richard Farley, your friendly neighborhood meter reader, in a local email list giving a rare first-hand account of how the Advanced Metering Infrastructure works in California. This is real Internet of Things territory. So if it doesn't have a typical post structure that is why. He generously allowed it to be reposted with a few redactions. When you see “A Major US Utility”, please replace it with the most likely California power company.

Old mechanical meters had bearings that over time wore out and caused friction that threw off readings. That friction would cause the analog gauge to spin slower than it should, resulting in lower readings than actual usage -- hence "free power". It's like a clock falling behind over time as the gears wear down.

For A Major US Utility "estimated billing" happens when your meter, for whatever reason, was not able to be read. The algorithms approved by the CPUC and are almost always favorable to the consumer. A Major US Utility hates to have to do estimated billing because they almost always have to underestimate based on the algorithms and CPUC rules. Not 100% sure about this, but if they underestimate, they have to eat the cost. In the rare case they overestimate (i.e., you were on vacation during the missed period), you will get "trued up" in the next billing cycle.

A Major US Utility does not see your actual use in "real time". For those interested in the nuts and bolts, here's how A Major US Utility’s AMI system works (AMI is short for Advanced Metering Infrastructure):

Click to read more ...


Stuff The Internet Says On Scalability For September 4th, 2015

Hey, it's HighScalability time:

An astonishing 300 billion stars in our galaxy have planets. Take a look in the Eyes on Exoplanets app.
  • 1 billion: people who used Facebook in a single day; 2.8 million: sq. ft. in new Apple campus (with drone pics);  1.1 trillion: Apache Kafka messages per day; 2,000 years: age of termite mounds in Central Africa; 30: # of times better the human brain is better than the best supercomputers; 4 billion: requests it took to trigger an underflow bug.

  • Quotable Quotes:
    • Sara Seager: If an Earth 2.0 exists, we have the capability to find and identify it by the 2020s.
    • Android Dick: But you’re my friend, and I’ll remember my friends, and I’ll be good to you. So don’t worry, even if I evolve into Terminator, I’ll still be nice to you. I’ll keep you warm and safe in my people zoo, where I can watch you for ol’ times sake.
    • @viktorklang: "If the conversation is typically “scale out” versus “scale up” if we’re coordination-free, we get to choose “scale out” while “scaling up.”
    • Amir Najmi: At Google, data scientists are just too much in demand. Thus, anytime we can replace data scientist thinking with machine thinking, we consider it a win.
    • @solarce: "don’t be be content that the software seems to basically work — you must beat the hell out of it" -- @bcantrill
    • John Ralston Saul: I have enormous confidence in the individual as citizen. I don't think there is any proof in our 2,500 years of history that the elites do a good job without the close involvement of the citizenry.
    • Joshua Strebel: on average Aurora RDS is 3x faster than MySql RDS when used with WordPress.
    • Martin Thompson: I'd argue that "state of the art" in scalable design is to have no contention. It does not matter if you manage contention with locks or CAS techniques. Once you have contention then Universal Scalability Law kicks in as you have to face the contention and coherence penalty that contented access to shared state/resources brings. Multiple writers to shared state is a major limitation to the scalability of any design. Persistent data structures make this problem worse and not better due to path-copy semantics that is amplified by the richness of the domain model.

  • Mike Hearn in a great interview on a16z, Hard Forks, Hard Choices for Bitcoin, had much to say on the future scalability of Bitcoin. One of the key ideas is that many of the things that people love about Bitcoin are based on Bitcoin's decentralized nature. Characteristics like it is permissionless, that it's the new gold, that it doesn't have a centralized policy committee, that it's a global network, and that it's a platform you can innovate on top of. One of the challenges with the keeping the current block size is that decentralization is already under stress. A certain amount of centralization has crept with ever bigger and bigger miners. With the collaboration of three or four companies they could start to apply some policy influence to Bitcoin and that would erode all the interesting properties that people love about Bitcoin. The challenge is to scale Bitcoin in balance with decentralization. Scaling and security as encapsulated by decentralization are tradeoffs. You can scale massively and lose decentralization and which point Bitcoin becomes Paypal. Yet if you keep the block size the same you make it so Bitcoin can't be used by a world wide audience. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


How Agari Uses Airbnb's Airflow as a Smarter Cron

This is a guest repost by Siddharth Anand, Data Architect at Agari, on Airbnb's open source project Airflow, a workflow scheduler for data pipelines. Some think Airflow has a superior approach.

Workflow schedulers are systems that are responsbile for the periodic execution of workflows in a reliable and scalable manner. Workflow schedulers are pervasive - for instance, any company that has a data warehouse, a specialized database typically used for reporting, uses a workflow scheduler to coordinate nightly data loads into the data warehouse. Of more interest to companies like Agari is the use of workflow schedulers to reliably execute complex and business-critical "big" data science workloads! Agari, an email security company that tackles the problem of phishing, is increasingly leveraging data science, machine learning, and big data practices typically seen in data-driven companies like LinkedIn, Google, and Facebook in order to meet the demands of burgeoning data and dynamicism around modeling.

In a previous post, I described how we leverage AWS to build a scalable data pipeline at Agari. In this post, I discuss our need for a workflow scheduler in order to improve the reliablity of our data pipelines, providing the previous post's pipeline as a working example.

Scheduling Workflows @ Agari - A Smarter Cron

Click to read more ...


Building Globally Distributed, Mission Critical Applications: Lessons From the Trenches Part 2

This is Part 2 of a guest post by Kris Beevers, founder and CEO, NSONE, a purveyor of a next-gen intelligent DNS and traffic management platform. Here's Part 1.

Integration and functional testing is crucial

Unit testing is hammered home in every modern software development class.  It’s good practice. Whether you’re doing test-driven development or just banging out code, without unit tests you can’t be sure a piece of code will do what it’s supposed to unless you test it carefully, and ensure those tests keep passing as your code evolves.

In a distributed application, your systems will break even if you have the world’s best unit testing coverage. Unit testing is not enough.

You need to test the interactions between your subsystems. What if a particular piece of configuration data changes – how does that impact Subsystem A’s communication with Subsystem B? What if you changed a message format – do all the subsystems generating and handling those messages continue to talk with each other? Does a particular kind of request that depends on results from four different backend subsystems still result in a correct response after your latest code changes?

Unit tests don’t answer these questions, but integration tests do. Invest time and energy in your integration testing suite, and put a process in place for integration testing at all stages of your development and deployment process. Ideally, run integration tests on your production systems, all the time.

There is no such thing as a service interrupting maintenance

Click to read more ...