Seven of the Nastiest Anti-patterns in Microservices

Daniel Bryant gave an energetic talk at Devoxx UK 2015 on lessons learned from over five years of experience with microservice based projects. The talk: The Seven Deadly Sins of Microservices: Redux (video, slides).

If you don't want to risk your immortal API then be sure to avoid:

  1. Lust - using the latest and greatest tech with the idea it will solve all your problems. It won't. Do you really need microservices at all? If you do go microservices do you really need new tech in your stack? Choose boring technology. Know why you are choosing something. A monolith can perform better and because a monolith can be developed faster it may also be the correct choice in proving your business case 
  2. Gluttony - excessive communication protocols. Projects often have a crazy number of protocols for gluing parts together. Standardize on the glue across an organization. Choose one synchronous and one asynchronous protocol. Don't gold-plate.
  3. Greed - all your service are belong to us. Do not underestimate the impact moving to a microservice approach will have on your organization. Your business organization needs to change to take advantage of microservices. Typically orgs will have silos between Dev, QA, and Ops with even more silos inside each silo like front-end, middleware, and database. Use cross functional teams like Spotify, Amazon, and Gilt. Connect rather than divide your company. 
  4. Sloth - creating a distributed monolith. If you can't deploy your services independently then they aren't microservices. Decouple. Transform data at a less central part of the stack. Some options are schema-first design and consumer-driven contracts.
  5. Wrath - blowing up when bad things happen. Bad things happen all the time so you need to test. Microservices are inherently distributed so you have network problems to deal with that weren't a problem in a monolith. The book Release It! has a lot of good fault tolerance patterns. Operationally you need to implement continuous delivery, agile, and devops. Test for failures using real life disaster scenarios testing, live injection failure testing, and something like Netflix's Simian Army.
  6. Envy - the shared single domain fallacy. A lot of time has been spent building and perfecting the model of a single domain. There's one big database with a unified schema. Microservices decompose a system along different lines and that can cause contention in an organization. Reports can be generated using pull by service or data pumps with events. 
  7. Pride - testing in the world of transience. Does your stuff really work? We all make mistakes. Think testing at the developer level, operational level, and business level. Surprisingly little has been written about testing microservices. Invest in your build pipeline testing. Some tools: Serenity BOD, Wiremock/Saboteur, Jenkins Performance Plugin. Testing in production is an emerging idea with companies that deploy many microservices.

Click to read more ...


Stuff The Internet Says On Scalability For July 31st, 2015

Hey, it's HighScalability time:

Where does IBM's Watson or Google Translate fit? (SciencePorn)
  • 40Tb/s: Bandwidth for Windows 10 launch; 4.04B: Facebook Q2 revenue; 37M: Americans who don't use the web;
  • Quotable Quotes:
    • @BoredElonMusk: We would have already discovered Earth 6.0 if NASA got the same budget as the DOD.
    • David Blight~ Something I've always believed as a historian and more and more it seems true to me is what really moves history, or brings about change in rather sudden and almost always unpredictable ways, is events. 
    • Quentyn Kennemer: Tom Brady replaces Android with iPhone, gets suspended 4 games
    • @BenedictEvans: Apple Maps has ~300m users to iOS GMaps 100m, of 4-500m iPhones. Spotify has 20m paying & 70m free users. And then there’s YouTube
    • Ben: Some scale problems should go unsolved. No. Most scale problems should go unsolved.
    • @mikedicarlo: 3.5 million Redis ops per/sec across our cluster. Wondering how that compares with other production deployments out there. 
    • @Carnage4Life: $1 billion valuation for a caller ID app with $800K in revenues? Unicorn valuations are officially meaningless 

  • Is shooting a trespasser filming a video of your potentially intimate moments considered a crime? Kentucky man shoots down drone hovering over his backyard

  • Death through premature scaling. Larry Berman determined this was the cause of death of RewardMe, his once scrappy startup. In the next turn of the wheel the dharma is:  Be a 1-man growth team;  Get customers online as oppose to through a long sales cycle; Don’t hold inventory; Focus on product and support. The new enlightenment: Don’t scale until you’re ready for it. Cash is king, and you need to extend your runway as long as possible until you’ve found product market fit. 

  • What about scaling for the rest of us? That's the topic addressed in Scaling Ruby Apps to 1000 Requests per Minute - A Beginner's Guide. A very good resource. It goes into explaining the path of request through Heroku. Dispels some myths like scaling up makes a system faster. Explains queue time. And other good stuff. 

  • Not quite as sexy as Zero Point energy, but 3D Xpoint memory sounds pretty cool: Intel and Micron have unveiled what appears to be the holy grail of memory. Called 3D XPoint (pronounced "cross point"), this is an entirely new type of non-volatile memory, with roughly 1,000 times the performance and 1,000 times the endurance of conventional NAND flash, while also being 10 times denser than conventional DRAM.

  • So what is 3D XPoint Memory really? Here's a great analysis at DailyTech by Jason Mick. More than analysis, it's a detective story. Jason puts together clues from history and recently filed patents to deduce that this new wonder RAM is most likely to be PRAM or Phase-change Memory, that stores data "in the form of a phase change to a tiny atomic-level structure." Jason thinks "any usage scenarios, it may be possible to run exclusively off PRAM." Forgetting just got even harder.

  • Damn. I may die after all. The Brain vs Deep Learning Part I: Computational Complexity — Or Why the Singularity Is Nowhere Near: My model shows that it can be estimated that the brain operates at least 10x^21 operations per second. With current rates of growth in computational power we could achieve supercomputers with brain-like capabilities by the year 2037, but estimates after the year 2080 seem more realistic.

  • It has always struck me that telcos who desperately want to get in to the cloud business, where they are just an also ran, control some of the most desired potential colo space in the world: cell towers. Turn those towers into location aware clouds and we can really get some revolutionary edge computing going on. Transiting traffic back to a centralized cloud is such a waste. Could 'Supercomputing at the Edge' provide a scalable platform for new mobile services?

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...


How Debugging is Like Hunting Serial Killers

Warning: A quote I use in this article is quite graphic. That's the power of the writing, but if you are at all squirmy you may want to turn back now

Debugging requires a particular sympathy for the machine. You must be able to run the machine and networks of machines in your mind while simulating what-ifs based on mere wisps of insight.

There's another process that is surprisingly similar to debugging: hunting down serial killers.

I ran across this parallel while reading Mindhunter: Inside the FBI's Elite Serial Crime Unit by John E. Douglas, a FBI profiler whose specialty is the dark debugging of twisted human minds.

Here's how John describes profiling:

You have to be able to re-create the crime scene in your head. You need to know as much as you can about the victim so that you can imagine how she might have reacted. You have to be able to put yourself in her place as the attacker threatens her with a gun or a knife, a rock, his fists, or whatever. You have to be able to feel her fear as he approaches her. You have to be able to feel her pain as he rapes her or beats her or cuts her. You have to try to imagine what she was going through when he tortured her for his sexual gratification. You have to understand what it’s like to scream in terror and agony, realizing that it won’t help, that it won’t get him to stop. You have to know what it was like. And that is a heavy burden to have to carry.

Serial killers are like bugs in the societal machine. They hide. They blend in. They can pass for "normal" which makes them tough to find. They attack weakness causing untold damage until caught. And they will keep causing damage until caught. They are always hunting for opportunity.

After reading the book I'm quite grateful that the only bugs I've had to deal with are of the computer variety. The human bugs are very very scary.

Here are some other quotes from the book you may also appreciate:

Click to read more ...


A Well Known But Forgotten Trick: Object Pooling

This is a guest repost by Alex Petrov. Find the original article here.

Most problem are quite straightforward to solve: when something is slow, you can either optimize it or parallelize it. When you hit a throughput barrier, you partition a workload to more workers. Although when you face problems that involve Garbage Collection pauses or simply hit the limit of the virtual machine you're working with, it gets much harder to fix them.

When you're working on top of a VM, you may face things that are simply out of your control. Namely, time drifts and latency. Gladly, there are enough battle-tested solutions, that require a bit of understanding of how JVM works.

If you can serve 10K requests per second, conforming with certain performance (memory and CPU parameters), it doesn't automatically mean that you'll be able to linearly scale it up to 20K. If you're allocating too many objects on heap, or waste CPU cycles on something that can be avoided, you'll eventually hit the wall.

The simplest (yet underrated) way of saving up on memory allocations is object pooling. Even though the concept is sounds similar to just pooling objects and socket descriptors, there's a slight difference.

When we're talking about socket descriptors, we have limited, rather small (tens, hundreds, or max thousands) amount of descriptors to go through. These resources are pooled because of the high initialization cost (establishing connection, performing a handshake over the network, memory-mapping the file or whatever else). In this article we'll talk about pooling larger amounts of short-lived objects which are not so expensive to initialize, to save allocation and deallocation costs and avoid memory fragmentation.

Object Pooling

Click to read more ...


Algolia's Fury Road to a Worldwide API Part 3

The most frequent questions we answer for developers and devops are about our architecture and how we achieve such high availability. Some of them are very skeptical about high availability with bare metal servers, while others are skeptical about how we distribute data worldwide. However, the question I prefer is “How is it possible for a startup to build an infrastructure like this”. It is true that our current architecture is impressive for a young company:

  • Our high-end dedicated machines are hosted in 13 worldwide regions with 25 data-centers

  • our master-master setup replicates our search engine on at least 3 different machines

  • we process over 6 billion queries per month

  • we receive and handle over 20 billion write operations per month

Just like Rome wasn't built in a day, our infrastructure wasn't as well. This series of posts will explore the 15 instrumental steps we took when building our infrastructure. I will even discuss our outages and bugs in order to you to understand how we used them to improve our architecture.

The first blog post of this series focused on our early days in beta and the second post on the first 18 months of the service, including our first outages. In this last post, I will describe how we transformed our "startup" architecture into something new that was able to meet the expectation of big public companies.

Step 11: February 2015

Launch of our Synchronized Worldwide infrastructure

Click to read more ...


Stuff The Internet Says On Scalability For July 24th, 2015

Hey, it's HighScalability time:

Walt Disney doesn't mouse around. Here's how he makes a goofy business plan.


  • 81%: AWS YOY growth; 400: hours of video uploaded to YouTube EVERY MINUTE; 9,000: # of mineable asteroids near earth; 1,400: light years to Earth's high latency backup node; 10K: in the future hard disks will be this many times faster 
  • Quotable Quotes:
    • @BenedictEvans: Chinese govt: At the end of 2014 China had 112.7 billion static webpages and 77.2 billion dynamic webpages. They used 9,310,312,446,467 KB
    • Michael Franklin (AMPLab): This is always a pendulum where you swing from highly distributed to more centralized and back in. My guess is there’s going to be another swing of the pendulum, where we really need to start thinking about how do you distribute processing throughout a wide area network.
    • Sherlock Holmes: Singularity is almost invariably a clue. 
    • @jpetazzo: OH: "In any team you need a tank, a healer, a damage dealer, someone with crowd control abilities, and another who knows iptables"
    • Jeff Sussna: Ultimately, the impact of containers will reach even beyond IT, and play a part in transforming the entire nature of the enterprise. 
    • @CarlosAlimurung: Impressive.  The number of #youtube channels making six figures grew by 50%. 
    • harlowja: Overall, no the community isn't perfect, yes there are issues, yes it burns some people out, but software isn't rainbows and butterflies after all.
    • werner: BTW nobody wants eventual consistency, it is a fact of live among many trade-offs. I would rather not expose it but it comes with other advantages ...
    • @VideoInkNews: We’re focused on our top three priorities – mobile, mobile and mobile, said @YouTube CEO @SusanWojcicki #VidCon2015 #keynote
    • Ivan Pepelnjak: Use a combination of MPLS/VPN and Internet VPN, or Internet VPN with 3G backup. Use multiple access methods, so the cable-seeking backhoe doesn’t bring down all uplinks.
    • @randybias: Repeat after me: containers do little to enable application portability.  If you want portability use a PaaS.  PaaS != Containers.
    • To see even more quotes please click through to see the rest of the post.

  • Can't we all just get along? And by "we" I mean humans and robots. Maybe. Inside Amazon shows by example how one new utopian community is bridging the categorical divide. Forget all your skepticism and technopanic, humans and robots can really work together in a highly efficient system.

  • A Brief History of Scaling LinkedIn. Not so brief actually. Lots of really good details. They of course started off with a monolith and ended up with a service oriented architecture. One of the most interesting ideas is the super block: groupings of backend services with a single access API. This allows us to have a specific team optimize the block, while keeping our call graph in check for each client.

  • If you want to move at the speed of software doesn't your datacenter infrastructure have to move at the same speed? Network Break 45 from Packet Pushers talks about an open source virtual software router, CloudRouter, running the latest release of OpenDaylight's SDN controller and ONOS. The idea is to make a dead simple router you can just instantiate as needed. Greg Ferro makes the point that if you don't have to care if you are starting 100 or 1000 virtual routers it changes how you go about building infrastructure. Running a Cisco Router, and F5 load balancer, and a virtual firewall, how much will it cost to spin up virtual datacenters for 100s of developers? How long will it take? How much will it cost? How does it even work? 

Click to read more ...


Architecting Backend for a Social Product

This is aimed towards taking you through key architectural decisions which will make a social application a true next generation social product. The proposed changes addresses following attributes; a) availability b) reliability c) scalability d) performance and flexibility towards extensions (not modifications)


a) Ensuring that user’s content is easily discoverable and is available always.

b) Ensuring that the content pushed is relevant not only semantically but also from user’s device perspective.

c) Ensuring that the real time updates are generated, pushed and analyzed.

d) Eye towards saving user’s resources as much as possible.

e) Irrespective of server load, user’s experience should remain intact.

f) Ensuring overall application security

In summary we want to deal with an amazing challenge, where we must deal with a mega sea of ever expanding user generated contents, increasing number of users, and a constant stream of new items, all while ensuring an excellent performance. Considering the above challenge it is imperative that we must study certain key architectural elements which will influence the over system design. Here are the few key decisions & analysis.

Data Storage

Click to read more ...


Sponsored Post: Redis Labs,, VoltDB, Datadog, Tumblr, Power Admin, MongoDB, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Make Tumblr fast, reliable and available for hundreds of millions of visitors and tens of millions of users.  As a Site Reliability Engineer you are a software developer with a love of highly performant, fault-tolerant, massively distributed systems. Apply here now! 

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Surge 2015. Want to mingle with some of the leading practitioners in the scalability, performance, and web operations space? Looking for a conference that isn't just about pitching you highly polished success stories, but that actually puts an emphasis on learning from real world experiences, including failures? Surge is the conference for you.

  • Your event could be here. How cool is that?

Cool Products and Services

  • MongoDB Management Made Easy. Gain confidence in your backup strategy. MongoDB Cloud Manager makes protecting your mission critical data easy, without the need for custom backup scripts and storage. Start your 30 day free trial today.

  • In a recent benchmark for NoSQL databases on the AWS cloud, Redis Labs Enterprise Cluster's performance had obliterated Couchbase, Cassandra and Aerospike in this real life, write-intensive use case. Full backstage pass and and all the juicy details are available in this downloadable report.

  • Real-time correlation across your logs, metrics and events. just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • VoltDB is a full-featured fast data platform that has all of the data processing capabilities of Apache Storm and Spark Streaming, but adds a tightly coupled, blazing fast ACID-relational database, scalable ingestion with backpressure; all with the flexibility and interactivity of SQL queries. Learn more.

  • In a recent benchmark conducted on Google Compute Engine, Couchbase Server 3.0 outperformed Cassandra by 6x in resource efficiency and price/performance. The benchmark sustained over 1 million writes per second using only one-sixth as many nodes and one-third as many cores as Cassandra, resulting in 83% lower cost than Cassandra. Download Now.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at

  • Here's a little quiz for you: What do these companies all have in common? Symantec, RiteAid, CarMax, NASA, Comcast, Chevron, HSBC, Sauder Woodworking, Syracuse University, USDA, and many, many more? Maybe you guessed it? Yep! They are all customers who use and trust our software, PA Server Monitor, as their monitoring solution. Try it out for yourself and see why we’re trusted by so many. Click here for your free, 30-Day instant trial download!

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Loggly alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here:

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required.

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...


Algolia's Fury Road to a Worldwide API Steps Part 2

The most frequent questions we answer for developers and devops are about our architecture and how we achieve such high availability. Some of them are very skeptical about high availability with bare metal servers, while others are skeptical about how we distribute data worldwide. However, the question I prefer is “How is it possible for a startup to build an infrastructure like this”. It is true that our current architecture is impressive for a young company:

  • Our high-end dedicated machines are hosted in 13 worldwide regions with 25 data-centers

  • our master-master setup replicates our search engine on at least 3 different machines

  • we process over 6 billion queries per month

  • we receive and handle over 20 billion write operations per month

Just like Rome wasn't built in a day, our infrastructure wasn't as well. This series of posts will explore the 15 instrumental steps we took when building our infrastructure. I will even discuss our outages and bugs in order to you to understand how we used them to improve our architecture.

The first blog post of the series focused on the early days of our beta. This blog post will focus on the first 18 months of our service from September 2013 to December 2014 and even include our first outages!

Step 4: January 2014

Click to read more ...


Stuff The Internet Says On Scalability For July 17th, 2015

Hey, it's HighScalability time:

In case you were wondering, the world is weird. Large Hadron Collider discovers new pentaquark particle.


  • 3x: Uber bigger than taxi market; 250x: traffic in HotSchedules' DDoS attack; 92%: Apple’s share of the smartphone profit pie; 7: Airbnb rejections
  • Quotable Quotes:
    • Netflix: A slow or unhealthy server is worse than a down server 
    • @inconshreveable: ngrok production servers, max GC pause: Go1.4 (top) vs Go1.5. Holy 85% reduction! /cc Go team
    • Nic Fleming: The fungal internet exemplifies one of the great lessons of ecology: seemingly separate organisms are often connected, and may depend on each other.
    • @IBMResearch: With 20+ billion transistors on new chip, that's a 50% scaling improvement over today’s tech #ibmresearch #7nm 

  • Apple and Google Race to See Who Can Kill the App First. Honest question, how are people supposed to make money in this new world? Apps are quickly becoming just an identity that ties together 10 or so components that appear integrated as part of the OS, but don't look like your app at all. Reminds me of laminar flow. We are seeing a rebirth of CORBA, COM and OLE 2, this time the container is an app bound by deep linking and some ad hoc ways to push messages around. Show developers the money.

  • The dark side of Google 10x: One former exec told Business Insider that the gospel of 10x, which is promoted by top execs including CEO Larry Page, has two sides. “It’s enormously energizing on one side, but on the other it can be totally paralyzing,”

  • Wait, are we going all RAM or all flash? So confusing. MIT Develops Cheaper Supercomputer Clusters By Nixing Costly RAM In Favor Of Flash: researchers presented evidence at the International Symposium on Computer Architecture that if servers executing a distributed computation go to disk for data even just 5 percent of the time, performance takes a hit to where it's comparable with flash memory anyway. 40 servers with 10 terabytes of RAM wouldn't chew through a 10.5TB computation any better than 20 servers with 20TB of flash memory. What's involved here is moving a little computational power off of the servers and onto the chips that control the flash drives.

  • Is disruption merely a Silicon Valley fantasy? Corporate America Hasn’t Been Disrupted: the advantage enjoyed by incumbents, always substantial, has been growing in recent years...more Americans worked for big companies...Large companies are becoming more dominant in part by buying up their rivals...Consolidation could explain at least part of the rising failure rate among startups...The startup rate has declined in every major industry, every state and nearly every city, and the failure rate’s rise has been nearly as universal. 

  • What's a unikernel and why should you care? Amir Chaudhry reveals all in his Unikernels talk given at PolyConf 15. And here's the supporting blog post. Why are we still applications on top of operating systems? Most applications are single purpose so why all the complexity? Why are we building software for the cloud the same way we build it for desktops? We can do better with Unikerels where every application is a single purpose VM with a single address space.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...