framework

Typesafe Interview: Scala + Akka is an IaaS for Your Process Architecture

This is an email interview with Viktor Klang, Director of Engineering at Typesafe, on the Scala Futures model & Akka, both topics on which is he is immensely passionate and knowledgeable.

How do you structure your application? That’s the question I explored in the article Beyond Threads And Callbacks. An option I did not talk about, mostly because of my own ignorance, is a powerful stack you may not be all that familiar with: Scala and Akka.

To remedy my oversight is our acting tour guide, Typesafe’s Viktor Klang, long time Scala hacker and Java enterprise systems architect. Viktor was very patient in answering my questions and was enthusiastic about sharing his knowledge. He’s a guy who definitely knows what he is talking about.

I’ve implemented several Actor systems along with the messaging infrastructure, threading, async IO, service orchestration, failover, etc, so I’m innately skeptical about frameworks that remove control from the programmer at the cost of latency.

So at the end of the interview am I ready to drink the koolaid? Not quite, but I’ll have a cup of coffee with the idea.

I came to think of Scala + Akka as a kind of a IaaS for your process architecture. Toss in Play for the web framework and you have a slick stack, with far more out of the box power than Go, Node, or plaino jaino Java.

The build or buy decision is surprisingly similar to every other infrastructure decision you make. Should you use a cloud or build your own? It’s the same sort of calculation you need to go through when deciding on your process architecture. While at the extremes you lose functionality and flexibility, but since they’ve already thought of most everything you would need to think about, with examples, and support, you gain a tremendous amount too. Traditionally, however, processes architecture has been entirely ad-hoc. That may be changing.

Now, let’s start the interview with Viktor...

HS: What is an Actor?

So let’s start from the very beginning! An Actor in the Actor Model is comprised by 3 distinct pieces:

A behavior
An address
A mailbox

The Address is the thing you send messages to, they are then put into the Mailbox and the Behavior is applied to the messages in the mailbox—one at a time. Since only one message is processed at a time, you can view an Actor as an island of consistency, connected to other actors via their Addresses and by sending and receiving messages from them.

There are 3 core operations that an Actor needs to support in order for it to qualify as an Actor.

CREATE—an Actor has to be able to create new Actors
SEND—an Actor needs to be able to send messages to Actors
BECOME—an Actor needs to be able to change its behavior for the next message

Since what you send messages to is an Address, there is an indirection which allows the Mailbox and Behavior to live essentially anywhere, as long as the message can get routed there. This is also referred to as Location Transparency.

HS: How does Akka implement the Actor model?

Like the Actor model but requests are served by a designated pool configured on a per-actor basis. This allows for fine-grained control over execution provisioning and a means of bulkheading parts of your application from other parts of the application. Akka also allows to configure the mailbox implementation on a per-actor basis, which means that some actors might need a bounded one, some might want a priority-based one, some might want a deduplicating one, or fine-tuning things like overflow protection with head-dropping vs. tail-dropping etc.

Comparing with Threads, Akka Actors are extremely light-weight, clocking in at around 500b per instance, allowing for running many millions of actors on a commodity machine. Like Erlang Processes, Akka Actors are location transparent which means that it is possible to scale out to multiple machines without changing the way the code is written.

Akka Actors do not block on a thread when not having anything to process, which allows for high throughput at low latency as wake-up lag for threads can be avoided. It is also possible to configure the number of messages to process before handing back the thread to the pool, it is also possible to specify a time slice which will allow for the actor to keep processing new messages as long as it hasn’t run out of its time slice before handing back the thread to the pool.

This allows to tune for fairness or for throughput. Akka Actors will not be preempted when a higher-priority message arrives, but it is possible to have multiple actors sharing the same mailbox, which can mitigate this if required.

Inspired by Process Linking from Erlang, Akka Actors form a strict hierarchy, where actors created by an actor from a child-parent relationship where the parent is responsible for handling the failure of the children by issuing directives on how to deal with the different types of failure that can occur, or choose to escalate the problem to its parent. This has the benefit of creating the same kind of self-healing capabilities exhibited by Erlang. It is also possible for an Akka Actor to observe when another Actor will not be available anymore, and handle that accordingly.

HS: Can you give an example of how Process Linking works in practice?

Actor A receives message B, which entails a potentially risky operation C (could be contacting an external server or do a computation that might blow up) instead of doing that work itself, it may spawn a new actor and let that actor do this risky operation. If that operation fails, then the exception is propagated to A (being the "parent") who can decide to restart the failed actor to retry, or perhaps log that it failed. No matter if it fails or not, A has not been at risk, as the dangerous operation was delegated and managed. In the case of a more serious error that A cannot manage, A would escalate that error to its parent who might then act upon it instead.

HS: Can you go into some more detail about bulkheading, why is it important and how it's accomplished in Akka?

The Bulkhead Stability Pattern is from EIP by Nygard. It's about gaining stability by compartmentalization, just like bulkheads for a boat.

Bulkheading of Threads in Akka is accomplished by assigning different thread pools to different segments of your actor hierarchy, which means that if one thread pool is overloaded by either high load, DoS attempt or a logic error creating an infinite loop for instance, other parts of the application can proceed since their Threads cannot be "infected" by the failing thread pool.

HS: Tail-dropping?

When it comes to dealing with asynchronous message passing systems one needs to decide what contention management policies one should use. Back-pressure is one policy, dropping messages is another, and if you decide to drop messages, which ones do you drop. Usually this is something that needs to be decided on a "per service" basis, either you drop the oldest (the one at the front of the queue, i.e. front-dropping) or the newest (tail-dropping). Sometimes one wants to have a priority queue so that the important messages end up at the front of the queue.

HS: What about these abilities helps programmers develop better/faster/robuster systems?

In any system, when load grows to surpass the processing capability, one must decide how to deal with the situation. With configurable mailbox implementations you as the developer can decide how to deal with this problem on a case-by-case basis, exploiting business knowledge and constraints to make sure that performance and scalability is not compromised to get the robustness (which is more than likely the case for a one-size-fits-all solution like backpressure).

HS: How does the location transparency work?

Each Akka Actor is identified by an ActorRef which is similar to Erlang PIDs, a level of indirection between the instance of the Actor and the senders. So senders only ever interact with ActorRefs which allows the underlying Actor instance to live anywhere (in the world potentially).

HS: Is there latency involved in schedule an Akka thread to execute?

When an Actor doesn't have any messages it is not scheduled for execution, and when it gets a message it will attempt to schedule itself with the thread pool if it hasn't already done so. The latency is completely up to the implementation of the Thread Pool used, and this is also configurable and extensible/user replaceable. By default Akka uses a state-of-the-art implementation of a thread pool without any single point of contention.

HS: Given you can configure the number of messages to process before handing back the thread to the pool, that makes it a sort of run to completion model and the CPU time isn't bounded?

Exactly.

HS: Can it be interrupted?

No, but as soon as one message is done, it will check if it still has time left, and if so it will pick the next message.

HS: Can you ensure some sort of fair scheduling so some work items can make some progress?

That is up to the ThreadPool implementation and the OS Scheduler, fortunately the user can affect both.

HS: When multiple Actors share the same mailbox, if some actor has the CPU, it won't give up the CPU for the higher priority message to be executed? How does this work on multiple CPUs?

If you have 10 Actors sharing a single priority mailbox and a thread pool of 10 Threads,

there is more opportunity for an actor to be done to pick up the high-priority work than if it's a single actor that is currently processing a slow and low priority message. So it's not a watertight solution, but it improves the processing of high-prio messages under that circumstance.

By placing requirements on priority of messages increases lock contention and sacrifices throughput for latency.

HS: How do Actors know where to start in a distributed fabric?

That is done by configuration so that one can change the production infrastructure without having to rebuild the application, or run the same application on multiple, different infrastructures without building customized distributions.

HS: How do Actors know how to replicate and handle failover?

Also in configuration.

HS: How do you name Actors?

When you create an Akka Actor you specify its name, and the address of the actor is a URI of its place in the hierarchy.

Example: "akka.tcp://applicationName@host:port/user/yourActorsParentsName/yourActorsName"

HS: How do you find Actors?

There are a couple of different ways depending on the use-case/situation, either you get the ActorRef (every Akka Actor is referred to by its ActorRef, this is equivalent to Address in the Actor Model) injected via the constructor of the Actor, or you get it in a message or as the sender of a message. If you need to do look ups of Actors there are 2 different ways, 1 is to create an ActorSelection, which can be described as query of the hierarchy, to which you can send messages and all actors matching the query will get it. Or you can use "actorFor" which lets you look up a specific actor using its full URI.

HS: How do you know what an Actor can do?

You don't. Well, unless you define such a protocol, which is trivial.

HS: Why is indirection an important capability?

The indirection is important because it clearly separates the location of the behavior from the location of the sender. An indirection that can even be rebound at runtime, migrating actors from one physical node to another without impacting the Address itself.

HS: How does you not have contention on you thread pools?

Every Thread in that pool has its own task-queue, and there is no shared queue. Tasks are randomly distributed to the work-queues and when a Thread doesn't have any tasks it will randomly work steal from other Threads. Having no single point of contention allows for much greater scalability.

HS: Could you please give a brief intro into Scala and why it's so wonderful?

Sure!

I come from a C them C++ then Java background and discovered Scala back in 2007.

For me Scala is about focusing on the business-end of the programming and removing repetition & "ritual" code.

Scala is a unifier of object orientation and functional programming, as well as it is trying to minimize specialized constructs in the language and instead giving powerful & flexible constructs for library authors to ad functionality with.

I personally enjoy that Scala is expression oriented rather than statement oriented, which simplifies code by avoiding a lot of mutable state which tend to easy turn into an Italian pasta dish.

A statement doesn't "return"/"produce" a result (you could say that it returns void), but instead it "side-effects" by writing to memory locations that it knows about, whereas an expression is a piece of code that "returns"/"produces" a value.

So all in all Scala lets me write less code, with less moving parts making it cheaper to maintain and a joy to write. A great combination in my book!

And not to forget that it allows me to use all good Java libraries out there, and even be consumed by Java (Akka can be used by both Scala and Java as an example).

HS: How do Scala futures fit into the scheme of things?

Alright. So I was a co-author of the SIP-14 proposal that was included in Scala 2.10. So the following explanations and discussions will center around that.

A Future is a read-handle for a single value that may be available at some point in time. Once the value is available it cannot and will not be changed.

A Promise is a write-handle for a single value that should be set at some point in time. Once the value is available it cannot and will not be changed.

The value of a Future/Promise may either be a result or an exception.

(You can get the corresponding Future from a Promise (by calling the future()-method on Promise) but not vice versa)

The strength of this model is that it allows you to program as if you already have the result, and the logic is applied when the result is available, effectively creating a data-flow style of programming, a model which easily can take advantage of concurrent evaluation.

When you program with Futures you need to have an ExecutionContext which will be responsible for executing the logic asychronously, for all intents and purposes this is equivalent to a thread pool.

As an example in Scala:

import scala.concurrent.{ Future, ExecutionContext }

import ExecutionContext.Implicits.global // imports into scope the global default execution context

// lets first define a method that adds two Future[Int]s

// This method uses a Scala for-expression, but it is only sugar for:

// f1.flatMap(left => f2.map(right => left + right))

// it asynchronously and non-blockingly adds the result of future1 to the result of future2

def add(f1: Future[Int], f2: Future[Int]): Future[Int] = for(result1 <- f1; result2 <- f2) yield result1 + result2

// Then lets define a method that produces random integers

def randomInteger() = 4 // Determined by fair dice roll

val future1 = Future(randomInteger()) //Internally creates a Promise[Int] and returns its Future[Int] immediately and calls "randomInteger()" asynchronously and completes the promise with the result which is then accessible from its Future.

val future2 = Future(randomInteger()) // same as above

val future3 = add(future1, future2)

None of the code above is blocking any thread, and the code is declarative and doesn't prescribe _how_ the code will be executed. The ExecutionContext can be switched without changing any of the logic.

So what happens if the value is exceptional?

val future3 = add(Future(throw new BadThingsHappenedException), Future(randomInteger()))

Then the exceptional completion of future1 will be propagated to future3.

So lets say we know a way to recover from BadThingsHappenedExceptions, let's use the recover method:

val future1a = Future(throw new BadThingsHappenedException)

val future1b = future1a recover { case e: BadThingsHappenedException => randomInteger() }

val future2 = Future(randomInteger())

val future3 = add(future1b, future2)

So here we first create future1a, which will be completed exceptionally with a BadThingsHappenedException,

then we call the "recover" method on future1a, and provide a (partial) function literal that can convert BadThingsHappenedExceptions to an Int by calling our amazing randomInteger() method, the result of "recover" is a new future, which we call future1b.

So here we can observe that futures are only completed once, and the way to transform the results or exceptions of a future is to create a new Future which will hold the result of the transformation.

So from a less contrived example standpoint, we can do things like:

val future1 = Future(callSomeWebService) recover { case _: ConnectException => callSomeBackupWebService() }

val future2 = Future(callSomeOtherWebService) recover { case _: ConnectException => callSomeOtherBackupWebService() }

val future3 = for(firstResult <- future1; secondResult <- future2) yield combineResults(firstResult, secondResult)

future3 map { result => convertToHttpResponse(result) }

recover { case _ => HttpResponse(400) } // underscore means "anything"

foreach { response => sendResponseToClient(response) }

So what we do here is that we asynchronously call a couple of web services, and if any of them fail with a ConnectException we try to call some backup webservice, then we combine the results of those web-service responses into some intermediate result, then we convert that result into some HttpResponse, if there has been any exceptional things happened this far, we'll recover to a HttpResponse which will have a 400-status and as the very last step we send our HttpResponse to some client that requested it.

So in our code we never wait for anything, what we do is to declare what we want to happen when/if we have a result, and there is a clear flow of data.

HS: Is a Future a scalar or can it have structure (arrays, maps, stucts, etc)?

It is a single memory slot that can only be written once. So what you write to it should be a value (i.e. immutable) but can be a struct, a Map or what have you.

HS: How do you implement more interesting state machines where results from one state are used in another? I think that's what I have a problem with a lot of times. I would prefer to go to clear error state where errors handled, for example. In the linkedin example they parallelize three separate calls and have a bit of error handling code somewhere that doesn't seem to know where the error came from or why, which makes crafting specific error response difficult.

I understand what you mean, but I view it differently. With Futures you deal with the failure where you can, just as you deal with exceptions in Java where you can. This may or may not be in the method that produces the exception, or in the caller, or in the callers caller or otherwise.

You could view Futures (with exceptional results) as an on-heap version of exception handling (in contrast to plain ex

Exception handling which is on stack, meaning that any thread can choose to deal with the exception and not only the thread that causes it).

HS: A lot of the never wait for anything seems normal to me in C++. Send a message. All IO is async. Replies comes back. Gets dropped into the right actor queue.

I hear you! A lot of the good things we learned from C/C++ still applies, i.e. async IO is more resource efficient than blocking IO etc.

HS: The actor state machine makes sense of what to do. Thread contexts are correct. In your example there's no shared state, which is the simplest situation, but when shared state is involved it's not so clean, especially when many of these are bits of code are execution simultaneously.

Of course, but it depends on what one means by shared state. Something that I find useful is "what would I do if the actors were people and they'd be in different locations?"

Sharing state (immutable values) via message-passing is perfectly natural and in reality mimics how we as humans share knowledge (we don't flip each others neurons directly :) )