Strategy

Microservices in Production - the Good, the Bad, the it Works

This is a guest repost written by Andrew Harmel-Law on his real world experiences with Microservices. The original article can be found here.

It’s reached the point where it’s even a cliche to state “there’s a lot written about Microservices these days.” But despite this, here’s another post on the topic. Why does the internet need another? Please bear with me…

We’re doing Microservices. We’re doing it based on a mash-up of some “Netflix Cloud” (as it seems to becoming known - we just call it “Archaius / Hystrix”), a gloop of Codahale Metrics, a splash of Spring Boot, and a lot of Camel, gluing everything together. We’ve even found time to make a bit of Open Source ourselves - archaius-spring-adapter - and also contribute some stuff back.

Lets be clear; when I say we’re “doing Microservices”, I mean we’ve got some running; today; under load; in our Production environment. And they’re running nicely. We’ve also got a lot more coming down the dev-pipe.

All the time we’ve been crafting these we’ve been doing our homework. We’ve followed the great debate, some contributions of which came from within Capgemini itself, and other less-high-profile contributions from our very own manager. It’s been clear for a while that, while there is a lot of heat and light generated in this debate, there is also a lot of valid inputs that we should be bearing in mind.

Despite this, the Microservices architectural style is still definitely in the honeymoon period, which translates personally into the following: whenever I see a new post on the topic from a Developer I respect my heart sinks a little as I open it and read… Have they discovered the fatal flaw in all of this that everyone else has so far missed? Have they put their finger on the unique aspect that mean 99% of us will never realise the benefits of this new approach and that we’re all off on a wild goose chase? Have they proven that Netflix really are unicorns and that the rest of us are just dreaming?

Despite all this we’re persisting. Despite always questioning every decision we make in this area far more than we normally would, Microservices still feel right to us for a whole host of reasons. In the rest of this post I hope I’ll be able to point out some of the subtleties which might have eluded you as you’ve researched and fiddled, and also, I’ve aimed to highlight some of the old “givens” which might not be “givens” any more.

The Good

They are the right size. And by “right” I mean “right” for a developer, for source-control, for CI, for documentation, for release/upgrade, for scaling, for resilience, for APIs/consumption/composition into things-larger, and finally for replacement. For all these things they’re all good
They get things out of the way. With Microservices we’re coding in Java again, well, in Camel-Java-DSL, and this lets us think like software engineers, rather than JEE architects or Spring-Bean-experts. It means we can TDD, and TDD like we meant it, and that means we can refactor, and keep our code and designs looking like we care about them. (Better still we haven’t had any weird classpath error issues to debug from our JEE server as we don’t have them. And when we’ve had problems on the wire, because we’re using HTTPClient direct, we can get into it and find out whats going on far more quickly)
They’re more predictable. Perhaps the biggest gain is because they’re “micro” they’re easy to comprehend. Now I’m not forgetting the dangers of combinatorial complexity (we’ll get to that later) but because we’re working with small, well-tested cohesive components here, and stateless, idempotent, circuit-broken ones at that, things are a lot more likely to do what we think they will do. The spare-cognitive-load gains from that as someone ultimately responsible for all this is immense
They’ve made us question how we do things. Accepted wisdom isn’t accepted any more. The change in approach has made us question far more than we would on a “regular” project. Because this fundamental part of our job has changed, what else might have changed too? The resulting flowering of creativity in the team has been exceptional, and its been exciting to see it unfold. What’s more, I’ve seen properly reusable code coming out of the teams for the first time in my career. It’s almost as if all this component-thinking at the microservice level is infecting everything else. #winning
It’s fun, it’s exciting, and actively doing things that are new (and this does feel new) keeps you on your toes far more than a “standard” (read: JavaEE) approach would. That’s a good thing. A great thing

The Un-Good

And yet its not all #winning. Perhaps it’s the lack of balanced opinion in the general chatter that makes me feel the fear I mentioned earlier. Most posts I read are just so sychophantic on the topic, and the world really doesn’t need another one of them. So, deep breaths, lets dig into that a bit more and present some of the reasons Microservices might not be for you.

First up, some honesty. We’re finding that despite all the noise in this space very few folks out there are actually doing this, and even fewer doing it in a public manner. We however are. Finding that very few others are with you can be a little scary at times. It means you actually have to do research and make your own decisions which for most of us in the safe world of Jave Enterprise Development is a new experience. Consequently, we’ve staked our professional reputations on this, and we’ve got a lot less to hide behind - we’re on the front line of all this challenging of accepted wisdom
Things won’t “just work” when you glue them together; and things which do work might not have the greatest documentation in the world. Many many decks on slideshare refer to all the bits you’ll need to get up and running in seconds - but there is a impedance mismatch between many of them, and as an early adopter you need to fix that. Having said this, the other thing all these projects have in common is a vibrant community. In most things in this area the code is under active development, and so is the documentation. If you want to get involved and help, folks are very pleased to have you along for the ride. That’s great, but it does slow things down a bit. It also means you might be ahead of the curve on the major libraries you rely heavily on - for example we plugged Hystrix into Camel for our own Circuit-Breaking purposes before they added a CB of their own in the latest (2.14.0) version. We also chose CXF for our REST APIs (when the rest of the world was going Jersey) because it was a first-class citizen in Camel, only to find the exposure of Camel Routes as REST wasn’t at that point incredibly mature. (Note: It now is, it’s even got Swagger support)
Related to the point above, you need to remember you might make wrong bets. I already mentioned that we went CXF when the rest of the world seemed to be going Jersey. To be honest that’s not hurt us too much - we have what we need. We’ve also been right in adopting Hystrix, Archaius, and looking at Eureka, Ribbon and Zuul as they have only just been announced as being supported by the new “Spring Cloud”. It might however have been a mistake to go for Spring Boot - Fabric8 is getting more and more mature by the day, and solves a lot of problems we might face at some point in the future, or have had to code around ourselves (i.e. by building our continuous deployment pipeline with Jenkins, Puppet and Capistrano)
You end up with a lot of moving parts. You’re confronting the Fallacies of Distributed Computing head on, and tackling them each in turn, in everything you build. You end up leaning more heavily on tools like Maven (in our case) or Gradle (we’re evaluating) , and thinking about versioning, and running concurrent versions from the beginning is key. You also end up needing to be able to “boot-up” a component with a load of stubs for all their dependencies so that you can run individual bits on their own
This is a new way of thinking. This is INTEGRATION at EVERY LEVEL. In the JavaEE world we never thought of threads because we weren’t allowed to. We’ve found that models like Scala’s Akka are a great mental tool for thinking about these problems, even if we’re not using the frameworks, but we had to get there the hard way
Following on from the two points above you end up having to make some concessions in order to be able to cope with all this - the biggest one is that you need to embrace immutability (in code and deployables, and environments) and discard state. This makes a lot of things easier, and, if you invest the time in proper Continuous Deployment and elastic scalability enablement
You end up a lot closer to the metal, and the network. We’re working with HTTPClient directly, and are looking at Protobuf / Thrift / Avro for inter-machine comms (Camel lets us do inter-process, intra-machine comms quite nicely). Latency also hits you front and centre. Again, having to deal with this head on is no bad thing, but it’s not the usual state of being for a traditional Java developer.
Inter-team comms - because you easily gain the benefits of small teams (2-3 dev) working on individual “services” hidden behind a clean API you end up hitting the many patterns finely articulated by Eric Evans in Chapter 14 and beyond of his Domain Driven Design: Tackling the Complexity in the Heart of Software (perhaps the best part of that fine volume). If you know to expect them then this isn’t too bad, but it will happen. This has meant that we’ve been looking at Swagger as a nice way of documenting all our APIs, internal and external, to reduce the interruptions when team A needs to consume component from team B
You end up with variation. Yes, we’ve componentized, have created some common components, and had various pull-reqest submissions back to the community accepted, but you still end up with various ways of doing the same thing. Now this isn’t always a bad thing, but it means that you need to rely much more on keeping a clean code(base). We review every pull request, with approval being required from >= 2 developers. Typically we aim for these approvers to be from outside the team. We’ve also instituted an internal RFC mechanism (stolen from Carl Quinn who I believe has used it at both Netflix and Riot Games to manage changes required by the dev teams he leads)
You end up flooded by data. Everyone says that you need a lot of monitoring. They’re right. Its easy to add too. So easy in fact that we ended up submiting a pull request to Hystrix which allows you to filter what is produced because we were flooding the UDP port of our monitoring servers. We’re lucky in that we have a great DevOps team who put all the supporting infrastructure in place for us to take advantage of all this too. But we’ve needed to become expert in setting up just-the-right-amount Graphana dashboards. Its another skill to learn.
Dev becomes support and Support becomes dev. this one relates back to Number 5. Folks who are in Support because they want to support things that look like other things will get a shock. We’ve been lucky - our support guys have been very keen to learn something new. We’re getting them to work on new features with us as a means of teaching them how things work. Additionally, because things are so new, we have to get a lot more involved in support. So much so that we’re trying to get a big TV for our dev area to put our Hystrix / Graphana Dashboards on permanently so we know how things are looking

And that’s it. I’d like to point out again that so far, we’ve been very pleased with our decision to adopt this architectural approach. But we’re still keeping our eyes open. Remember, there’s No Silver Bullet.

A Reading List

Before we close, here’s a reading list of the things we’ve found most useful in our journey to here. Please add a comment if you have any other suggestions of items to add:

Microservices in Production - the Good, the Bad, the it Works

The Good

The Un-Good

A Reading List

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale