Case Study on Scaling PaaS infrastructure

In his blog post, Scaling WSO2 Stratos, Srinath Perera explains the scaling architecture of the WSO2 Stratos Platform as a Service (PaaS) infrastructure. It is explained as a series of solutions where every solution adds a new concept to solve a specific problem found in the earlier solution.

Overall, WSO2 Stratos uses a combination of intelligent Load balancing and lazy loading to scale up the architecture. More details about Stratos can be found from the paper WSO2 Stratos: An Industrial Stack to Support Cloud Computing.

Problem

Stratos is multi-tenanted. In other words, there are many tenants. Each tenant generally represents an organization and isolated from other tenants, where each tenant has his own users, resources, and permissions. Stratos supports multiple PaaS services. Each PaaS service is actually a WSO2 Products (e.g. AS, BPS, ESB etc.) offered as a service. Using those services, tenants may deploy their own Web Services, Mediation logic, Workflows, and Gadgets etc.

  1. WSO2 Stratos runtime provides servers where each can support multiple tenants and provide a PaaS service. For example, there are multi-tenanted AS, BPS, ESB etc.
  2. Stratos can provision (add/remove) resources (computing nodes) as needed on demand.
  3. Problem is to build a system that scale up and down without end users realizing it.

Answers

Following describes a series of solutions while each solution adds a new feature to solve a specific problem. It explains the rationale and thought process behind the final design.

Solution 1
WSO2 Stratos consists of a multi-tenanted server of each type (e.g. ESB, BPS etc.). Users first talks to an Identity Server (IS), gets a SSO (single sign-on) token, and log in to any server. We stored all tenants data in a registry. Each server loads all tenants at the startup and can support any tenant when they receive a request.

Solution 2
Solution 1 does not scale at all. So we started running multiple instances of each server and put a load balancer (LB). LB load balances the requests to different servers. When load on the server instances are high, LB starts new server instances and when the load is low, LB shuts down some instances. We call this auto-scaling.

Solution 3
When Stratos had several hundred tenants and many tenants with tens of services, it took a long time to load all tenants at the startup. Start up took 15-30 minutes. Furthermore, most tenants stays inactive most of the time. However, since each node has to hold all tenants, Stratos spends resources for inactive tenants as well.

To avoid above problems, solution 3 added lazy loading. All information about tenants is stored in a central registry. Tenants are loaded into memory only when they are needed. Tenants get unloaded when they have been idle for more than a given timeout. You can find more information about Lazy loading from Azzez’s blog entry “Lazy Loading Deployment Artifacts in a PaaS Deployment”.

Solution 4
When tenants have several artifacts, loading them takes time. So if the tenant is accessed while it has not been loaded in solution 3, first request or two to the tenant will timeout.

Solution 4 added ghost deployer to solve the above problem. Ghost deployer does not load all information about tenants, but just loads the metadata. Actual artifacts are loaded on demand. As a result, loading a tenant has become a much simpler in Solution 4. So this avoids requests from timing out while loading the tenant. You can also find more information about Lazy loading from Azzez’s blog entry “Lazy Loading Deployment Artifacts in a PaaS Deployment”.

Solution 5
However in the solution 4, LB does not scale to handle a large number of requests. So we replace the LB with multiple LBs that have same metadata in all nodes. Therefore, all LB will take the same decision when it received a request. We can use this model to scale Stratos by placing a hardware Load balancer or setting up DNS round robin to distributed requests among LBs.

To synchronize the metadata across all LBs, we can use group communication. That is a MXM communication, which is heavy. Instead,  LBs in Stratos are designed to send updates as batches to a single decision service, and the decision service takes auto-scaling decisions. We enforce High Availability by running a replica of the decision service and keeping it up-to-date via state replication through group communication.

Solution 6
However, in Solution 5, LB instances are not aware of tenants. Due to lazy loading, all requests will work even through LBs route messages arbiterly. However, this might lead to a scenario where a single node has to load too many tenants.

To avoid this, in the solution 6, LBs are aware of tenants and allocate only a subset of tenants to each LB. You can find more information from the Sajeewa’s blog entry, WSO2 Tenant Aware Load balancer.

Solution 7
Upcoming Stratos release will follow the solution 6. However, the next potential problem is that Registry which holds the configurations and resources of all tenants could not scale to handle a large number of tenants. Hence the registry needs to be partitioned across multiple users.