In Memoriam: Lavabit Architecture - Creating a Scalable Email Service

With Lavabit shutting down under murky circumstances, it seems fitting to repost an old (2009), yet still very good post by Ladar Levison on Lavabit's architecture. I don't know how much of this information is still current, but it should give you a general idea what Lavabit was all about.

Getting to Know You

What is the name of your system and where can we find out more about it?

Note: these links are no longer valid...

Lavabit
http://lavabit.com

http://lavabit.com/network.html
http://lavabit.com/about.html

What is your system for?

Lavabit is a mid-sized email service provider. We currently have about 140,000 registered users with more than 260,000 email addresses. While most of our accounts belong to individual users, we also provide corporate email services to approximately 70 companies.

Why did you decide to build this system?

We built the system to compete against the other large free email providers, with an emphasis on serving the privacy conscious and technically savvy user. Lavabit was one of the first free email companies to provide free access via POP and later via IMAP. To this day over 90 percent of our users access the system using POP or IMAP.

How is your project financed?

The project was initially financed by the founders, but now lives off money collected via advertising and paid users. Ongoing development efforts are subsidized by our consulting business; quite simply we work on the code base for Lavabit when we have slowdowns in our consulting business.

What is your revenue model?

Offer a superior product and hope that its increasing use leads to advertising revenues, and paid account upgrades.

How do you market your product?

We rely on word of mouth to grow the service. Since most of what we provide is free, we can't justify the cost of advertising (at least right now).

How long have you been working on it?

The service has been running since the summer of 2004. Originally we called the service Nerdshack, but changed the name to Lavabit at the request of our users in December of 2005.

How big is your system? Try to give a feel for how much work your system does.

Every day the system handles approximately 200,000 email messages, while rejecting another 400,000 messages as spam. Lavabit currently averages about 12,000 daily logins, of which 80 percent are via POP, 10 percent are via IMAP and 10 percent are via the webmail system. The website itself sees about 2,500 unique visitors per day, resulting in approximately 170,000 page and file requests.

Number of unique visitors?

Approximately 12,000 unique visitors per day and approximately 28,000 unique visitors per month.

Those are the old numbers as of 2009. When Lavabit shutdown the updated numbers were:  40,000 people logging in every day and sending 1.4 million messages per week.

Number of monthly page views?

3,728,686 for Jan 2009
3,929,292 for Feb 2009
These numbers only consider HTTP requests.

What is your in/out bandwidth usage?

We currently send about 70 gigabytes per day through our upstream Internet connection. See this page for a graph:

http://lavabit.com/grahps.html

How many documents, do you serve? How many images? How much data?

Our system currently handles approximately 180,000 inbound emails, and another 20,000 outbound emails per day. This translates into about 70 gigabytes of traffic.

How fast are you growing?

We see about 150 new user registrations per day.

What is your ratio of free to paying users?

We currently have approximately 1,500 actively paying customers.

What is your user churn?

Our daily login average has recently been growing by about 250 per month. We hope to grow a lot faster when our new website and webmail system launch later this year.

How many accounts have been active in the past month?

34,247 between 2/10/2009 and 3/10/2009.

How is your system architected?

What is the architecture of your system? Talk about how your system works in as much detail as you feel comfortable with.

For SMTP, POP and IMAP Connections
We use a 2-tier architecture. There is an application tier that runs our custom mail daemon and a support tier made up of NFS and MySQL servers. A hardware based load balancer (Alteon AD4) is used to split incoming SMTP, POP and IMAP connections across the 8 application servers (Dell 1650's with 4gb of RAM). The application servers also run memcached instances.

The application servers are used to handle the bulk of the processing load. The daemon itself is a single process, multi-threaded application written in C. Currently each daemon is configured to pre-spawn 512 threads for handling incoming connections. Another 8 threads are used to asynchronously pull ads from our advertising partner's HTTP API, and perform maintenance functions. Maintenance functions involve updating in-memory tables, expiring stale sessions, log file rotation and keeping the ClamAV signatures up to date.

From an architecture standpoint, each incoming connection gets its own thread. This allows us to use blocking IO. We currently rely on the Linux kernel to evenly split the processor among the connections.

We currently call our mail daemon "lavad" and it fluently speaks SMTP, POP and IMAP. The daemon is also responsible for applying all of our business logic and interfacing with the different open source libraries we use.

When accepting messages from the outside world via SMTP, the daemon will perform the following checks:

  • If the recipient is valid
  • Whether the incoming IP is listed on an RBL
  • If the return path can be validated using SPF (libspf2)
  • Against any size or rate limits for the account
  • Against the user’s gray list
  • For viruses (libclamav)
  • For a valid domain key signature (libdomainkeys)
  • Whether the message looks like spam based statistical token data (libdspam)
  • And finally against any filters used to sort or delete messages matching a regular expressions

Whether a specific check is used depends on the user’s preferences, and the account plan they have. For example, the spam filter is limited to paid users (because of the load it places on the database).

Depending on the outcome of the different checks, the user can choose to label the message, reject it, or in some cases delete it silently. If the message needs to be bounced, a bounce message is only sent if the return path can be verified using a) SPF or b) if the sender is verified using domain keys, and the sender matches the return path.

As a final step, the message is encrypted using ECC (if applicable), compressed using LZO and then stored on the NFS server.

For POP connections, the process is relatively simple. The user authenticates and requests a message. The daemon loads the message, checks the hash to make sure the date hasn’t been corrupted, decompresses the data, and then decrypts it (if applicable) before sending it along to the client.

Because we need the plain text password to decrypt a user’s private key, we don’t support secure password authentication. We decided to support SSL instead (which encrypts everything; not just the password). We handle the SSL encryption at the application tier rather than on the load balancer because we feel the application tier is easier to scale.

On a side note are failure to support secure password authentication hasn’t stopped people from clicking the Secure Password Authentication checkbox in Outlook, and creating a support nightmare for us. Outlook doesn’t enable SMTP authentication by default either, so that creates another support nightmare for us. If any mail client developers read this; please start making port 587 the default instead of port 25, and auto detect SMTP authentication.

When retrieving messages for users that have the statistical spam filter enabled or users who have selected a plan with advertising, the daemon will also insert small text signature. The signature will have a link for training the server side spam filter and/or a small text advertisement.

For IMAP connections, the daemon also presents messages in folders and allows server side searches of messages. Currently searches are handled by reading in all of the message data from disk; which results in a large performance hit if the folder is large. If the user connects multiple times using the same credentials, the connections will share a centralized copy of their mailbox state, which also creates lock contention issues. Search is certainly one area where we need to improve.

For outbound messages, the daemon will authenticate the user's credentials against the database, apply any sending limits for the account (to prevent abuse), check whether the From address matches an email address associated with the credentials provided (to prevent spoofing), and finally the daemon checks whether the message contains a virus. Assuming all of the checks are passed, the message is cleaned up, signed using domain keys, and then relayed via our internal network to a Postfix server that handles relaying it to the final destination.

The daemon uses pools for sharing anything it can, including MySQL connections, ClamAV instances, cURL instances (for pulling ads), Memcached instances, libspf2 instances, etc. To keep deployments simple, we compile all of the open source libraries we rely upon into a single archive that is then dynamically loaded at runtime. We don’t compile the libraries directly into our application because doing so would require us to release the daemon under the GPL, and we don’t rely on dynamic linking since we don’t want any of the key libraries to be automatically updated by the operating system without us knowing.

For HTTP Connections
Like inbound mail connections, HTTP connections are split between two servers using the load balancer. Apache is currently used to handle the requests. While most of the website is static XHTML files, our registration engine is written in C (with libgd for the CAPTCHA images, and libcurl for processing credit cards using the PayFlow Pro HTTP API). All of our C applications rely on the Apache CGI interface.

The preferences portal is currently written in Perl and the webmail system is based on a popular open source client and is written in PHP. We modified the webmail system to fit more smoothly into our site. The webmail client currently connects to the mail system using IMAP, with each web server getting a dedicated IMAP server.

* What particular design/architecture/implementation challenges does your system have?

The Big Problem
While it is very easy to setup a mail system that reliably handles email for a few thousand users, it is incredibly difficult to scale that same system beyond a single server. This is because most email servers were originally designed for use on a single server. If you grow these same systems beyond a single server you typically need to use a database and/or a NFS server to keep everything synchronized between the different nodes. And while it is possible to build large database and NFS instances, it is also very expensive, and depending on the setup it can be very inefficient.

If you want to avoid the single database or NFS server problem, you can do so by adding a lot of complexity. For example, if you wanted to implement a very large Cyrus system the typical solution is to use LDAP for authentication, and then use an IMAP/POP reverse proxy to intercept incoming connections and forward them to the specific Cyrus server for that user. The problem with managing a system like this is the relatively high number of critical pieces that can fail. The following image visualizes what a system like this might look like:

Embedded Image


For a full write up on this design, see http://www.linuxjournal.com/article/9804

The problem with systems designed this way is the large number of critical services. If a Cyrus server goes down, then all of the users hosted on that system are offline. You can mitigate this risk with a failover system that is periodically rsync’ed with the master, or by using a SAN, but these options are either inefficient or expensive. (There is at least one medium sized free/paid email company that uses a system like this, and presumably this is why the limits on their free plan are so low.)

At one point Yahoo Mail relied on a very large NetApp device to centrally store mail. And while the NetApp devices scale well, they are also very expensive. It was this high cost that kept Yahoo Mail from matching Gmail’s 1GB quota for almost a year. When they finally completed their move to a distributed architecture in 2007, they began offering unlimited storage.

It is precisely because of how easy it is to implement small mail systems, and so difficult to implement large mail systems that the Internet sees literally hundreds if not thousands of free email companies start and then fail each year. If the system becomes popular they have no easy way to scale it, so they are forced to either stop accepting free accounts, or shut down the system. Only a small number of companies have the financial and technical resources to build systems from the ground up to support a large user base. It is also why Lavabit followed the example of the large providers, and implemented a completely custom platform; it was the only way we could support 100k+ users with a cost basis low enough to make the business profitable.

The key to keeping any large system manageable and cost effective is to keep it simple. The fewer critical failure points you have in the system, the better things will run.

Current Problems with Our Implementation
We have locking issues when multiple IMAP connections try to access the same mailbox; only one thread is currently allowed access to the mailbox at a time, which can present a problem if the user makes a request that takes a long time to process (ie search or bulk fetches via IMAP).

Reading in the full message when only the header is needed has caused a performance problem. We also need to implement code for indexing messages and processing searches using an index instead of reading in all of the data for each search request. Unfortunately we don’t have dedicated search gurus on hand to help with this like our competitors.

The statistical spam filter currently stores all of its token and signature data in the database. Checking the tokens in a message against a database using SQL is very inefficient. And because of how difficult it is to scale MySQL databases, doing this for 200,000+ messages a day would be very expensive. At some point we will change the way token data stored so that the load is more evenly distributed across the cluster and then we will be able to offer the filter to everyone.

The webmail system does not keep IMAP connections open. As a result data is freed and then reloaded frequently. Also because the whole message is loaded when the webmail system is only requesting the header, a lot of unnecessary data is often pulled from the NFS servers.

Naturally were working to fix all of these issues.

What did you do to meet these challenges?

We implemented a custom mail system, which was designed from the ground up to efficiently handle a large number of users. A custom platform has also allowed us to implement a lot of custom business logic that would have been difficult, if not impossible using an off-the-shelf system.

How did your system evolve to meet new scaling challenges?

We started with a single server that used Postfix and Qpopper. We used amavisd for virus and spam filtering and a custom policy daemon to make sure users didn’t send to much mail, or spoof someone else’s address. This system worked well for about 4 months; but with a few thousand users the system started to choke. We had to turn off new user registrations until we could transition onto a custom multi-server platform.

Originally there was an application for handling SMTP connections and a separate one for POP connections. Each application would spawn multiple processes with multiple threads so that if one died from a segmentation fault another could take its place. (This is how Apache and Postfix are designed for both reliability and security.) Over time we combined the protocols into a single process which spawns a larger number of threads. We could do this because over time we worked out the memory bugs. In the last year we’ve only had 4 nodes die from segmentation faults, all of which were triggered by bugs in the libraries we use. (But we also haven’t released any major changes into production in the last year.) This single daemon design has made our use of things like database connections much more efficient.

Do you use any particularly cool technologies or algorithms?

The way we encrypt messages before storing them is relatively unique. We only know of one commercial service, and one commercial product that will secure user data using asymmetric encryption before writing it to disk. Basically we generate public and private keys for the user and then encrypt the private key using a derivative of the plain text password. We then encrypt user messages using their public key before writing them to disk. (Alas, right now this is only available to paid users.)

We also think the way our system is architected, with an emphasis on being used in a cluster is rather unique. We would like to someday release our code as free software. We haven’t yet because a) we don’t want anyone else building a competing system using our code, b) while we’ve moved more settings and logic into a configuration file over the last couple of years, there is still a lot of logic hard coded, and c) we’ve created the code specifically for Cent OS, and don’t have the resources to test and support it on other operating systems right now. We’ve spent some time looking for a company to sponsor open sourcing the code, but haven’t found one yet.

What did you do that is unique and different that people could best learn from?

One of the ways to gain an advantage over your competition is to invest the time and money needed to build systems that are better than what is easily available to your competition. It is the custom platform we developed that has allowed us to thrive while many other free email companies either stopped offering their service for free, or shut down altogether.

That said, you should always start by improving the components that will make the most difference to your users, and move on from there. For Lavabit that meant starting out with a custom mail platform, but continuing to use Postfix for outbound mail, MySQL for synchronization, and NFS for file storage.

This may be a good place to note that in 2004 the only major database system with production ready cluster support was Oracle and it remains a very expensive option (way beyond the budget of a free service like ours). Since then SQL Server and MySQL have both improved/added support for clustering (replication and failover is _not_ not the same as cluster support). And while the MySQL cluster implementation still needs work before developers can stop worrying about the scalability of the database, the world is getting closer to that point. Distributed caching (memcached) and distributed file systems (Lustre/GFS) have also matured since we started in 2004. Throw in cloud services like S3 into the equation and it is almost easy to implement a highly scalable website or service.

What lessons have you learned?

The only way to guarantee success is through hard work.

Why have you succeeded?

We are committed to providing a superior service and offering it on terms we think all users should be demanding. We are also committed to continually improving the service we offer.

What do you wish you would have done differently?

There are a number of areas in our platform we wish were implemented differently. In most cases we made the decisions we did because implementing them the "right" way would have taken longer.

A good example is the IO model were using. The asynchronous IO model used by lighttpd and memcached is more efficient than our current model, but we felt doing things this way would have taken longer while giving us little initial benefit. See this quite famous web page for a full write up on the issue:
http://www.kegel.com/c10k.html

We also wish we had finished the IMAP server earlier than October of 2007, and finished our custom webmail system by now.

What wouldn't you change?

We are happy with the decision to enter the email service business. Overcoming the challenges involved in building a reliable and scalable mail platform has been rewarding.

Personally I also enjoy knowing that the system I helped create is being used by 12,000 each day. Sometimes I find myself thinking "there are 1,000 people connected to this system right now." I like those numbers.

How much up front design should you do?

Collectively, the engineering team has spent thousands of hours doing research to help make our mail system better. This knowledge has been invaluable not only in improving the mail system, but also in helping our professional services clients.

The bottom line is that it is easier to make changes to a design document than it is to the code. What this means is that if you don’t clearly understand how something should be implemented, it pays to write design documents first. The hours you save in the end will far outweigh the hours you spend writing the documentation.

How are you thinking of changing your architecture in the future?

We’ve had a major update to our website and our application tier in the works for almost a year already. The details involved in this update (outside of what has already been mentioned here) are still a secret. I will definitely need to update this write up when we are ready to push the development tree into production.

What infrastructure do you use?

Which programming languages does your system use?

The main application daemon is written in C. Web pages generally use XHTML, CSS and Javascript.

We still have a number of legacy web applications and maintenance scripts written in Perl, and the webmail system is currently in PHP, but these are all slated for conversion to C as time allows.

Our consulting projects typically involve development in C#; and we spent a lot of time thinking about whether to implement the system in C# back in 2004. In the end, we felt that Windows and .NET would not be a good choice for a scalable mail platform. We felt that the increased performance, the lack of licensing costs, and the availability of so many open source libraries for handling mail meant that the best choice for us was to go with Linux and C.

If we had to make a similar decision today, there is a chance we would not have chosen to go with C. Given the stability and efficiency of Windows 2008, the growing amount of open source C# code on the Internet and the availability of Mono as an alternative to .NET, we may have opted for C# instead.

In our experience, the decision on what platform to choose for a project can often be broken down into simple math. Building applications in C# is typically faster than C. For us that typically means 3 to 4 times faster than using C, and about 1.5 faster than using PHP. If you can figure out how much additional development time it will take to use one platform over another, it becomes easy to calculate whether the performance and license savings of one platform will offset the increased cost of development. In general hardware and software is cheap compared to development time, so the number of applications which can justify being built in C or C++ is getting very small.

On a side note, we think the productivity gap between IIS/.NET/SQL Server/Visual Studio and Apache/PHP/MySQL/Eclipse is largely a result of how well the Microsoft tools have been integrated with each other.

How many servers do you have?

We have 14 servers dedicated to the mail platform. We have 1 server dedicated to monitoring, and another 11 servers used for website hosting. Most of the websites we host were developed by the Lavabit team, so we don’t consider it part of our core business.

How is functionality allocated to the servers?

We move services from one server to another as necessary. Our long term goal is to create a global pool of servers than can handle everything, so fewer clusters are needed, and the load is distributed more evenly.

How are the servers provisioned?

We typically buy servers on eBay, and then install and configure them ourselves.

What operating systems do you use?

The mail system is currently using Cent OS 4. The application servers use the 32 bit version, and the database and storage servers use the 64 bit version.

We also use Windows 2003 for hosting other things not related to the mail service.

Which web server do you use?

Apache 2.0

Which database do you use?

MySQL 4.1

Do you use a reverse proxy?

Our load balancer will route connections from the same IP to the same node, which tends to make caching easier, but we do not use a "true" reverse proxy to route connections based on user credentials.

Do you collocate, use a grid service, use a hosting service, etc?

We’ve created our own platform. We lease space inside a colo facility to host all of our equipment.

What is your storage strategy?

The storage servers have 3ware RAID cards, and use SATA drives. We are currently using the ext3 file system, and share files via NFS. The database servers have PERC 4 cards and use SCSI.

How much capacity do you have?

Overall, we are only using about 10 percent of our total processing power, and about 25 percent of our currently subscibed bandwidth. The two areas where we currently have issues are the IMAP servers dedicated to handling webmail connections, and disk throughput on the NFS servers. Throughput on the NFS servers is limited by the IO controller cards, and the file system in use (ext3). Ultimately we feel there is a lot of room for growth available to us by making our code more efficient.

The servers in the "earth" cluster are only averaging about 8 percent utilization. This 8 node cluster is used to handle inbound SMTP, POP and IMAP connections, and run memcached instances. A typical CPU graph for a server in that cluster looks like:

Embedded Image

As you can see the servers allocated to the "earth" cluster still have plenty of room for growth. In contrast, if you compare this to a CPU graph from one of the servers in the "mars" cluster you will notice one of our current capacity problems. The "mars" cluster is made up of the two nodes dedicated to handling IMAP connections from the webmail system, and the CPU graphs for both nodes looks like:

Embedded Image

As you can see from the above graph, this server is having issues. Both servers dedicated to handling webmail traffic will get memory upgrades sometime next week which will help, we also have plans to push a new build with some minor improvements that we hope will help with the problem. This utilization issue has only shown up in the last month or so.

Some of you may be wondering why we don’t use servers from our "earth" cluster to handle more of the webmail load. The source of this utilization issue is the way our code is written. Adding more hardware to the application layer would actually make the problem worse, as it would further stress our NFS servers, and hurt the performance seen by users not connected via the webmail system. The root problem is that when a request comes in via IMAP for a message header our code loads the entire message off the NFS server. As a result instead of having to make a 4kb read per email we are pulling what in some cases might be a 128 megabyte message off the NFS server. The result is about 100 times more data is transferred than is actually needed to satisfy the IMAP request. We’ve tried using memcached to cache messages, but unfortunately too many of the messages are larger than 1 megabyte and can’t be stored in memcached.

Why didn’t this problem show up sooner? Most IMAP clients store a copy of the message header locally, so they only need to download new message headers when they connect. However because the webmail client is stateless, it needs to download the message headers every time a user logs in. Because our current webmail system has only been available publicly for about a year, it took some time for webmail traffic to grow large enough to make this problem critical.

From a database standpoint we still have plenty of capacity. Here is a CPU graph of our master database:

Embedded Image

We’ve spent a lot of time optimizing our database usage because of how difficult it is to scale out a database.

Unlike our database servers, the NFS servers are currently disk bound. While I don’t have a graph of disk utilization (it averages 95 percent), I do have a graph of the network utilization:

Embedded Image

As you can see, these servers are being seriously taxed. I should note that 85 percent of this traffic is for the two IMAP servers handling webmail traffic. We are hoping to push a new build that will fix this problem soon.

And finally, our two HTTP servers running Apache are quite happy as you can see from this utilization graph:

Embedded Image

The spike is from the nightly slocate and rsync cron jobs.

How do you grow capacity?

We added 8 application servers about 1 year ago, and we still have plenty of capacity. We ordered more memory for the nodes in the "mars" cluster, and will install it next week. We also have plans to add more memory to our database server and hard drives to our NFS server this summer.

If our growth rate remains close to what it has been historically, we will to keep using our current hardware until the summer of 2010. At that point we are hoping to make the jump from Cent OS 4, to Cent OS 6, and move the application tier from 32 bit hardware to 64 bit hardware.

How do you handle session management?

We use sticky sessions. The load balancer routes incoming connections to nodes based on the incoming IP. Session state is then maintained internally by our application using custom code. In a handful of specific scenarios we also serialize a portion of the session state data and then store it in memcached so it is available to the other nodes.

How is your database/datatier architected?

We have a master database which replicates critical tables to a slave server for backup purposes. All of our application traffic goes to the master database, since the slave server uses significantly older hardware. We pull the offline backups from the slave database server using mysqldump. We can afford to have the slave tables locked for extended periods because no production traffic goes to it.

Our overall strategy is to use highly optimized SQL with no views, stored procedures or triggers. We only use joins when absolutely necessary. As a result, we have gotten very good throughput from our database server. We also place an emphasis on minimizing database use by caching anything we can either at the application level or with memcached. We try to save CPU cycles on the database server by performing as much work as we can on the application servers (sort operations are good example of that). Our theory is that even if it takes ten times the amount of work to sort a result set at the application layer, it is a hundred times easier to scale out the application cluster than it is to scale out the database servers.

Stats from our primary database server (as of March 11th, 2009):

Uptime: 495 days 2 hours 37 min 17 sec

Questions: 4896924841

Queries per second avg: 114.474

And some typical performance numbers:

% user  % nice  % system % iowait  % idle

1.35 0.00 0.60 0.48 97.57

The hardware for the database server was acquired in 2005 and is a dual Opteron, with 8 GB of RAM, and 6 SCSI drives in a RAID 5 configuration.

Which web framework/AJAX Library do you use?

Jquery on the client side. We have our own C framework on the server side.

How do you handle ad serving?

We download the ads from our current ad network via their HTTP interface and then insert them into messages as they leave the server via POP or IMAP.

While there are a lot of ad networks on the Internet today, we have only found a couple that provide an API which allow us to download ads onto the server and then insert them into emails (or web pages). It seems the more widely used strategy is to use Javascript on the client, which in our opinion is an inferior delivery method.

What is your object and content caching strategy?

We currently use memcached for storing objects, and the libmemcached library for interfacing with memcached servers. We’ve also implemented custom C code for handling serialization.

One major drawback we’ve run into is that memcached is not designed for caching objects larger than 1 megabyte. This is a problem for us, as we like to store the compressed (and in some cases encrypted) version of emails in memcached to reduce the load on our NFS servers. Since a significant percentage of emails are larger than 1 megabyte, we can only partially implement this strategy.

We have looked into increasing the size limit for memcached but found that the current memcached codebase doesn’t handle small objects as efficiently when the slab size is increased. In the future we may start breaking large messages up into chunks and spreading them across servers.

Which third party services did you use to help build your system?

We’ve used contract programmers in the past to develop tools for managing the platform; but these days we prefer to keep things in house. We currently don’t rely on any 3rd parties for anything critical (other than our colo provider for space, power, cooling and bandwidth). We don’t like the idea of making excuses if another company makes a mistake. We also believe that given the nature of the service were providing keeping things in house results in competitive advantages.

How do you health check your server and networks?

We have Nagios setup with just over 500 different checks (across 24 total hosts). A number of the checks are custom Perl scripts that can check specific things inside our database, or at the application layer. We also have a third party service checking for sanity on our critical load balanced ports just in case something breaks between our network and the outside world.

We use Cacti. And while Cacti is not the easiest tool to use, it is both powerful and free. One of our engineers spent a week or so and learned enough about Cacti to create the 120 or so different graphs we currently have setup.

How do you test your system?

We’ve written several command line programs that emulate the various protocols for testing during development. When were ready to deploy we will also test the build with several of the more popular email clients.

When were ready, we will deploy a new build onto a single node and let it run for a week or more to make sure it is stable. During that time we monitor support emails and log files for any hidden problems.

How do you analyze performance?

We pull statistics into Cacti via SNMP and custom scripts, and then monitor the resulting graphs. We then focus our development efforts on the areas we think will make the most difference. During development we will also use gprof to profile the code and optimize the functions we spend the most time executing.

How do you handle security?

We keep all of our servers on a private subnet, and use the load balancer to provide connectivity to the ports we care about (DNS, HTTP, SMTP, POP, IMAP and the SSL alternatives). We use OpenVPN to get access to other ports (SSH, MySQL, NFS, SNMP from outside the facility).

We have an Intrusion Detection System in place along with some custom scripts to make sure things stay secure (the details of which are secret). We also use a least privilege model (chroot and suid) to minimize the amount of damage someone could do if they did compromise a service.

How do you handle customer support?

Currently the engineering team takes turns handling support requests. We feel this keeps the engineers aware of what could be improved. Support requests are primarily received via our contact form and email.

How do you decide what features to add/keep?

We listen to our users and try to focus on what they ask for. We also use the service ourselves, so we have a good feel for what improvements would make the biggest impact. Our list of desired features has grown very long...

Do you implement web analytics?

Yes, we use Awstats for simple log file analytics.

Do you do A/B testing?

Not specifically, but we have written a number of small programs that test specific functionality, and use them to verify builds. We’ve also written a number of unit tests that are executed when the server starts to make sure the environment is sane and everything is working correctly.

How many data centers do you run in?

So far we’ve kept the system inside one 42U rack. At least in the short term, we plan to grow the system through hardware upgrades (if the need arises).

How do you handle fail over and load balancing?

We use a hardware load balancer (Alteon AD4). The load balancer presents a single virtual IP to the internet and then splits incoming connection requests among the different application servers. The load balancer modifies the IP packets in realtime to maintain the illusion. The load balancer also monitors the different application server nodes and automatically removes any that fail.

Which DNS service do you use?

Good old fashioned BIND for the domains we care about. Requests are split among the different DNS servers using the load balancer. We let our registrar handle DNS for the domains that we don’t consider critical.

Which routers do you use?

Our colo provider has Cisco 6509 routers.

Which switches do you use?

Linksys gigabit switches. Since all of our app servers have two gigabit interfaces we take advantage of this with a physical/logical network for public traffic, and a separate physical/logical network for database, NFS, and memcached traffic.

Technically the Alteon AD4 is considered a Layer 7 switch.

Which email system do you use?

We use our own custom platform, although Postfix is still used to relay outbound mail.

How do you handle spam?

We rely on 3rd party RBL’s, along with several of our own additions. Paid users also have access to a statistical filter based on DSPAM.

How do you handle virus checking of email and uploads?

ClamAV is used to scan inbound and outbound messages. In 2004 we looked at using the Sophos library, but it was too expensive for us to license. Since we don’t want to involve another process, our options remain limited to products that have library APIs.

How do you backup and restore your system?

All critical servers use RAID 5. The MySQL database is replicated to a slave server, which is then used to create offline backups.

As for the mail data, we feel it changes too frequently to make an offline backup solution viable (not to mention the security implications of keeping messages after a user deletes them). For a handful of our corporate users, mail data is stored on at least two NFS servers, and then deleted by the client app from both servers when the time comes.

How are software and hardware upgrades rolled out?

Individual nodes are taken offline, the software upgraded, and then brought back online. Typically we test new versions on one of the nodes for a week or more before rolling it out across the entire cluster. As a result we spend a lot of time making sure we have version N and version N – 1 compatibility.

How do you handle major changes in database schemas on upgrades?

We test, we pray, and we monitor.

We’ve only done two non-compatible rollouts since coming online in 2004. The first was when we switched to our custom platform in February/March of 2005 (we had to fail back to the old system twice before our daemon was stable enough to use full time). The second upgrade was in October of 2007 (I think) when we changed the file format used to store messages and had to convert all of the existing data over. This cutover went smoothly, because we minimized the code that was changed between the two versions to just that used for decrypting and compressing messages. During the upgrade, unconverted messages were hidden from users via the database.

We may need to make one more leap sometime in 2009/2010 as we have several major database and file system changes planned that won’t be backwards compatible. Stay tuned...

What is your fault tolerance and business continuity plan?

We keep offsite backups of application code, configuration information, and key database tables.

Do you have a separate operations team managing your website?

Yes. Only two people have access to the production equipment and databases.

Do you use a content delivery network? If so, which one and what for?

The mail system data changes far too frequently for a CDN to make sense.

In contrast the consulting side of our company has deployed three websites that used Akamai or Internap to deliver video. We recommend their use if your hosting static content that is particularly sensitive to latency (audio and video primarily) or you have a web application that must load quickly all over the globe.

How much do you pay monthly for your setup?

We feel our current pricing for space, power and bandwidth with Colo4Dallas is a very competitive 4 figure amount; but we’ll keep the exact amount to ourselves for now.

Miscellaneous

Who do you admire?

Google, Amazon, Live Journal and Facebook on a technical level. I feel these companies succeeded because their founders built better websites than the competition and then coupled their initial success with good technical and business decisions as their sites grew. The jury is still out on Google on an ethical level.

I also admire Fog Creek software for how it (supposedly) treats both customers and employees.

In the same vein, I still wonder how MySpace survived after making so many poor technical decisions early on. Although I will admit they have improved tremendously since their early days.

Have you patterned your company/approach on someone else?

Our corporate philosophy is a combination of many different approaches. The biggest influences have probably been the book Founders at Work by Jessica Livingston and the Joel Spolsky books Joel on Software and The Best of Software Writing.

Are there any questions you would add/remove/change in this list?

You might want to consider creating a short form of this questionnaire. It took awhile to write all of this. :)