Tech Dev Stages explains the basic steps involved for the product development given business problems. A must read for newbie or starters for architecture development.
ThePort is an excellent example of a real world in-the-trenches product offering real value to customers. One of the most interesting problems they have to solve is multi-tenancy. How do you provide good performance, complete customization, support, develop new features, and provide individual search indexes for each customer? It's not an easy problem to solve.
How did they solve their problems and build a successful system?
* 6 x Dell blade servers running windows 2008 / IIS 7
* 1 r/w SQL Cluster – dell 6850s (6 single core processors, 32 GB RAM)
* 2 read-only dell 2950 (2 quad core processors, 16 GB RAM)
* 1 distribution server – dell 2950 (2 quad core processors, 16 GB RAM)
* 2 Dell blade servers 8 GB RAM each to total 16 GB of available RAM
* Running SharedCache (Basically an open source .NET port of MemCacheD. We initially looked at MemCacheD but our internal benchmarking indicated SharedCache had better performance – at least w/in a Microsoft environment. We may still investigate Microsoft's Velocity cache platform when it goes live)
* 2 Dell 2950s with 725 GB Storage
* Running Lucene + SOLR
* We chose Lucene over Lucene.NET because Lucene.NET's wildcard search was a little buggy in our initial beta testing. SQL Full Text wasn't a viable option because there was no clear and easy way to split indexes between customers. SOLR cores make this part easy. Above and beyond that, Lucene is lightning fast and is available with features we couldn't turn down (proximity search, searching w/in documents, and built-in RESTful APIs to name a few)
How do you handle multi-tenancy?A multi-tenant platform has two primary hurdles to overcome:
1. Preventing a single, large customer from overwhelming the system?The primary bottleneck for this is in the data layer. Our current DB architecture has helped mitigate this problem. The read-only servers help offset most of this by absorbing the bulk of the data calls. We did have to beef up the distribution server because latency between the r/w server and the read only servers had crept too high. Getting a new machine (2 quad cores with 16 GB of RAM) helped reduce the latency to less than a second.
However robust the cluster is, we've concluded that we will eventually have to move to a sharded architecture with MySQL. MS SQL licensing fees makes both continuing to enhance the cluster and scaling out to multiple machines prohibitive. Additionally, sharding allows us to scale either by customer (because some may be more active than others) or by functional area (photos, comments, etc).
2. Allowing clients to have total control over the look, feel, and user experience of their sites.
Allowing CSS control isn't enough; we needed a templating system that allows total control over the site. We looked at using .NET master pages and user controls to accomplish this. But that assumes a level of knowledge in .NET for outside developers. We built a proprietary templating system that unfortunately became too limiting and would one day lead to a drag on performance.
So we settled on using XML / XSLT. All of our business / entity objects are serializable to XML. This made XSLT a natural choice from the templating angle. We've seen a considerable boost in performance from this upgrade and an even greater increase in flexibility in terms of what our designers can do. Once the learning curve is overcome, the web designers love the amount of control they get.
What did you do that was especially cool that people could learn from?
XSLT as Custom Templating SystemBuilding a templating system in XSLT that actually allows the template author to make a web service call to our internal web service layer (or external web services) straight from the templating system. This allowed the development team to build a flexible, powerful system that allows a web designer to embed real-time calls into a given template. We accomplish this using XSLT Extension Objects. What we've found in our internal testing is that these extension objects scale way better than our previous templating system (a homegrown proprietary system). We've used ANTS profiler to compare the two and the difference is in orders of magnitude.
Obviously we have to cache the hell out of this or the performance of the pages the calls are embedded in would suffer. For now, we make the internal web services calls via HTTP, but we will soon be moving this to a TCP call to take advantage of the better connection pooling offered by TCP. We're most likely to use WCF because of it's native support of TCP bindings. However, we haven't yet benchmarked that so it's possible it could change.
Not Using the Database to Build CollectionsAnother cool thing we've done is to move strongly away from using the database to retrieve collections of 'things'. For instance, if we needed a collection of comments, previously we'd hit the database for the 5, 10, 100, etc comments we wanted, do the sorting / filtering in the DB, return a single dataset, cache that, and then display.
However, this is a database intensive operation, especially if you're going to join against user data (which you inevitably will). What we've started doing recently is caching the recent comment objects, and using our cache providers MultiGet ability to simultaneously retrieve all comments at the same time. We then sort / filter in memory in the application tier, discard whatever comments we don't need, and then display. We found that doing it this way, we save lots of hits to our database and in fact, saw a considerable performance gain from it.
Our tests (on a developer laptop) fetched 10,000 objects from cache in about 1 second, then sorted them by date time in about .015 second.
What prompted you to move to a SOA architecture?
To better compartmentalize our code.Given the growth of our templating system mentioned above, we realized it was best to truly separate the tiers into discrete areas. Since our application is easily accessed via a set of REST APIs and our own internal skinning system (and who knows what in the future), dividing the application like this gives us a lot of leeway in being able to swap out components. Additionally, we're doing more and more queuing which lines up nicely with SOA.
PerformanceSince modern web apps deal with complex data, breaking the work into more discrete operations handled by offline processes on their own infrastructure makes a lot of sense from a performance point of view.
How do you handle consistency between the database and the search engine?We have a multi-threaded windows service that scans our database once every 5 minutes looking for new data. The service then adds the new items to the Lucene index. We keep audit columns on all our database tables so capturing new data is pretty simple. Once a night, we purge the Lucene index and run a full rescan of the database. We think this system will work for the near to mid term but long term, we'll take advantage of a queuing system to keep the index in sync.
How you handle your release, support, bug fixing, development, etc.We have a decent sized dev team. 1 platform architect responsible for overall system architecture (selecting which systems to use, tuning them), 1 lead software architect, and 3 senior – mid level developers. Since we're a start-up in a fast evolving market (social media) we find that we're constantly having to adjust to market demands and the latest in social functionality. So we have a 2 month build cycle which is pretty aggressive.
In terms of actual development, we've found the following to be keys to success:
1. Daily stand-ups: it's absolutely necessary for everyone on the team to know what the other is doing. A code base as large as ours, it's very likely I'm writing a function someone has already written or solving a problem someone has solved previously. Daily stand-ups help with that
2. Iterate: Build the core functionality, get it into QA and / or beta, beat the bugs out of it, move to the next piece. We've found this to be easier said than done. Market pressures sometimes dictate you roll with something more feature rich than you'd like. Sticking to an iterative cycle creates better code and more market ready products.
3. Beta test: This goes hand-in-hand w/ #2 above. Get something done and get it in the hands of actual users. This is the best way to find where your app falls down
With regard to support / bug fixing, we're moving to a forums based support model for many of our customers. We've found the same problems, especially in an app as configurable as ours, occur over and over. Getting those answers into an open, searchable format should hopefully cut down on confusion and get developers talking directly to developers.
Internally we use Trac for bug tracking and devote roughly 20% of our week maintaining, supporting, and fixing issues. That may seem like a lot but given how configurable our system is, we're essentially running 50 heavily data driven websites.
WCF sounds like a buggy underpeforming mess. How is it working?So far we have no complaints with WCF. I think baking it directly into .NET 3.5 helped iron a lot of the big kinks out. It does come with it's quirks, no doubt. We built our REST libraries on top of it and found that posting XML is not exactly the easiest thing in the world. But it was more than made up for with the ease in deploying all our GET operations with REST. Our next step will be to set up TCP and MSMQ bindings with WCF to handle our internal service requests and queuing, respectively. Since WCF exposes all of these bindings natively, we think we will see a lot of effective code re-use out of this.
I'd like to thank TJ for taking the time and making the effort to write up their architecture for people to learn from. I'm sure it will help others when they are trying to build their own systems.
You too can share the architecture for your amazing system. Come on, you've learned a lot from others, it's time to return the favor and give back. It's not that hard, really. If interested please contact me and we can get started.
One of the top recommendations from the collective wisdom contained in Real Life Architectures is to add monitoring to your system. Now! Loud is the lament for not adding monitoring early and often. The reason is easy to understand. Without monitoring you don't know what your system is doing which means you can't fix it and you can't improve it. Feedback loops require data.
Some popular monitor options are Munin, Nagios, Cacti and Hyperic. A relatively new entrant is a product called Reconnoiter from Theo Schlossnagle, President and CEO of OmniTI, leading consultants on solving problems of scalability, performance, architecture, infrastructure, and data management. Theo's name might sound familiar. He gives lots of talks and is the author of the very influential Scalable Internet Architectures book.
So right away you know Reconnoiter has a good pedigree. As Theo says, their products are born of pain, from the fire of solving real-life problems and that's always a harbinger of good things to come.
The problem Reconnoiter is trying to solve is monitoring thousands of nodes across many datacenters where the nodes can vary widely in power, architecture, and software configuration. With that kind of problem what they really want is the ability to:
If you've ever used or written a distributed stats collection system the architecture of Reconnoiter will look somewhat familiar:
Some of the more interesting bits of the architecture are:
Reconnoiter isn't completely pain free. Lua for an extension language is an interesting choice. The installation and configuration process is very complex. There are a lot of separate steps and bits to configure. Another potential problem is monitoring produces a lot of real-time data. I have to wonder if PostgresSQL can handle that flow for very large systems. The data is partitioned by month, but a large number of machines and a large number of events can be crushing. And I wasn't sure if graph data could be correlated with released features or other system changes. In the video Theo mentions seeing in the graphs that using deflate improved performance, but I'm not sure just looking at the graph how you would be able correlate system data with system changes.
It's droolingly clear where Reconnoiter shines is on creating complex graphs, charts, and other visualizations. The graphs look useful and quick to render. The real time visualizations are spectacular and extremely are difficult to do in other systems.
AFK Partners has release what they feel are the Best Practices for Scalability:
- Asynchronous - Use asynchronous communication when possible.
- Swim Lanes – Create fault isolated “swim lanes” of hardware by customer segmentation.
- Cache - Make use of cache at multiple layers.
- Monitoring - Understand your application’s performance from a customer’s perspective.
- Replication - Replicate databases for recovery as well as to off load reads to multiple instances.
- Sharding - Split the application and databases by service and / or by customer using a modulus.
- Use Few RDBMS Features – Use the OLTP database as a persistent storage device as much as possible.
- Slow Roll – Roll out new code versions slowly, to a small subset of your servers without bringing the entire site down.
- Load & Performance Testing – Test the performance of the application version before it goes into production.
- Capacity Planning / Scalability Summits – Know how much capacity you have on all tiers and services in your system.
- Rollback – Always have the ability to rollback a code release.
- Root Cause Analysis - Ensure you have a learning culture that is evident by utilizing Root Cause Analysis to find and fix the real cause of issues.
- Quality From The Beginning – Quality can’t be tested into a product, it must be designed in from the beginning.
At one of my jobs I have to administer a CISCO ACE (application control engine) hardware load-balancer.
I don't particularly love this beast, but it's very very powerful.
There appears to be little real-world info out there, so it could be interesting writing an article on that.
But I don't have other HW LB's to compare it to and I don't want to rehash the product page.
What would interest you in a 'product review' of a loadbalancer?
No replies means it's not an interesting topic, so no article then ;-)
In this blog post BJ Clark overviews the available key-value stores that aren't SQL based.
On some projects he yields interesting experiences. Make sure to check out the comments as well.
The post has spread throughout social tiny text services so there is a chance you've already read it, but still worthwhile to put here.
Me and my partner are making a blueprint for an online webshop service. The purpose of this project is to make webshops available for small company's/ individuals automatically just by creating an account with us. Our webapp can be used to add products/pages/... to the store and we'll handle secure checkout by paypal.
Our app should be scalable and manageable. Because we also want to offer free webshops, the amount of webshops could be +10.000 within a few years. We are building on the Zend framework and are using mysql for database.
From the start we want to build our application for optimal and easy scalability in the future, to avoid a lot changes to our app/database in the future.
Now our questions are:
Should we use?:
* one database for all shops (or limited to X shops );
* one database for each new shop (each having products, orders... tables);
I think both approaches have PRO/CONS. What do you think ? Does anyone has experience with this kind of structure ?
one database: easier to make changes to database layout
multiple databases:more scalable, easier to backup/restore
one database: harder to code because of extra keyfields in tables , slower or more difficult to backup/restore.
multiple databases: Takes longer to push changes to database layout
We are totally not clear on what hosting we should use. Would it be a solution to use a cloud service such as mosso/amazon/gogrid? Or is it better to just start with one dedicated server and expand 'manually' later?
Thanks in advance for your help!
So far every massively scalable database is a bundle of compromises. For some the weak guarantees of Amazon's eventual consistency model are too cold. For many the strong guarantees of standard RDBMS distributed transactions are too hot. Google App Engine tries to get it just right with entity groups. Yahoo! is also trying to get is just right by offering per-record timeline consistency, which hopes to serve up a heaping bowl of rich database functionality and low latency at massive scale:
We describe PNUTS [Platform for Nimble Universal Table Storage], a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of con-current requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results.
Some of the cool things about PNUTS are:
From a system perspective PNUTS offers a lot of the good things: hosted, reliability, lowish latency, automation, scalability, supports many application models, and there's a lot of room to improvement that all applications will be able to take advantage of when available.
From a programmer perspective the are also a lot of good things: it's hosted so fewer worries, notifications, flexible schemas, ordered records, secondary indexes, lowish latency, strong consistency on a single record, scalability, high write rates, reliability, and range queries over a small set of records.
Unfortunately Goldilocks still needs to keep searching for just right, though she may be getting closer. From a system perspective Yahoo!'s ideas are good, but they don't help you as the system isn't available for you to use. From a programmer perspective the programmer's job is still way too hard. To be just right programmer's need low latency aggregate operators, complex transactions, scalable counters, automatic relationship management, and all the other features that will help them just buy instant porridge and be done with it.
Update: Asynchronous HTTP cache validations. A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience.
Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails. Glenn Franxman also has a Django solution in MintCache.
Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly SQL query you want to avoid and a little stale data is OK. Or maybe the amount of data you have is simply larger than physical memory on any one machine. Or maybe you have the temerity to write to your database and cause its cache to flush so database caching isn't sufficient at a certain level of scale.
Typical examples are for caching article vote counts, comment threads, and event streams. One familiar example that bit me hard is displaying the the top N blog articles. Do you want to scan through your entire access log table for every page display? Absolutely not. Especially when the nightly backups are going on and the network is very slow. Not good :-) Yet you still want to update the results every X minutes so the stats stay fresh.
Data freshness requires a refrigeration truck or an expiry time on your cache entry that causes stats to be periodically recalculated. Now, what happens when your cached data expires and a 1000 requests simultaneously try to recalculate the expensive to calculate data? Database load spikes and the world nearly ends. And since memcached operations are not atomic it's possible stale data could be cached and you'll serve stale data. Which kind of defeats of the purpose of taking load off the data while providing accurate data. So, how do you unpile the dogs?
No Expire Solution
If cache items never expire then there can never be a recalculation storm. Then how do you update the data? Use cron to periodically run the calculation and populate the cache. Take the responsibility for cache maintenance out of the application space. This approach can also be used to pre-warm the the cache so a newly brought up system doesn't peg the database.
The problem is the solution doesn't always work. Memcached can still evict your cache item when it starts running out of memory. It uses a LRU (least recently used) policy so your cache item may not be around when a program needs it which means it will have to go without, use a local cache, or recalculate. And if we recalculate we still have the same piling on issues.
This approach also doesn't work well for item specific caching. It works for globally calculated items like top N posts, but it doesn't really make sense to periodically cache items for user data when the user isn't even active. I suppose you could keep an active list to get around this limitation though.
Stale Date Solution
This solution introduces a stale date in addition to the expiration date. Glen describes it as:
The first client to request data past the stale date is asked to refresh the data,
while subsequent requests are given the stale but not-yet-expired data as if it
were fresh, with the understanding that it will get refreshed in a 'reasonable'
amount of time by that initial request
In the memcached FAQ a one key approach is described:
Alexey describes a different two key approach:
I dislike embedding meta data with data so I like Alexey's approach a bit better, even though it doubles the key space.
None of these options prevent the problem for ever happening, but they do greatly reduce the failure window for relatively little cost.
Update 2: Elastic Load Balancer and EC2 instance bandwidth. It turns out we are limited by bandwidth and not by CPU. Solution: use DNS Round Robin for two to three HighCPU medium instances.
Update: The Skinny Straw: Cloud Computing's Bottleneck and How to Address It. For cloud computing, bandwidth to and from the cloud provider is a bottleneck. Solution: Evaluate application architecture and consider application partitioning.
I'm writing this post as a sort of penance. My sin was getting involved in another mutli-threaded mess of a program that was rife with strange pauses and unexpected errors. I really should have known better. But when APIs choose to make callbacks from some mystery thread pool it's hard to keep things straight. I eventually sobered up and posted all events to a queue so I could make sure the program would work correctly. Doh. I may never know why the .Net console output stopped working, but I'll live with it.
And that reminded me that I've been meaning to write a post on the standard Cloud Architecture. I've tried to hit all the common architectures at one time or another, but there have been some excellent sources lately on structuring programs in a cloud that people may "know" in the same way I knew what not to do, but when the code hits the editor those thoughts may have hidden like a kid next to a broken cookie jar.
The easiest way to create a scalable service is to compose the service from other scalable services. This is how Google AppEngine works and is largely how AWS works as well (EC2, S3, SQS, SimpleDB, etc), though AWS also functions as a blank canvas on which you can draw your own designs.
The canonical cloud architecture that has evolved revolves around dynamically scalable CPUs consuming asynchronous, persistently queued events. We talked about this idea already in Flickr - Do the Essential Work Up-front and Queue the Rest. The cloud is just another way of implementing the same idea.
Amazon suggests a few applications of the Cloud Architecture as:
- Document processing pipelines – convert hundreds of thousands of documents from Microsoft Word to PDF, OCR millions of pages/images into raw searchable text
- Image processing pipelines – create thumbnails or low resolution variants of an image, resize millions of images
- Video transcoding pipelines – transcode AVI to MPEG movies
- Indexing – create an index of web crawl data
- Data mining – perform search over millions of records
- Back-office applications (in financial, insurance or retail sectors)
- Log analysis – analyze and generate daily/weekly reports
- Nightly builds – perform nightly automated builds of source code repository every night in parallel
- Automated Unit Testing and Deployment Testing – Test and deploy and perform automated unit testing (functional, load, quality) on different deployment configurations every night
- Websites that “sleep” at night and auto-scale during the day
- Instant Websites – websites for conferences or events (Super Bowl, sports tournaments)
- Promotion websites
- “Seasonal Websites” - websites that only run during the tax season or the holiday season (“Black Friday” or Christmas)
A good list, but after having worked on a seasonal website for taxes AWS is a horrible match. AWS only works on the instance level, so you need a whole instance turned on all the time even when there's no demand. This is a complete waste of money. An AWS model truly based on use combined with an SLA driven dashboard would be very convenient. But on to cases where AWS is a good fit.
SmugMug's Cloud ArchitectureAWS pioneer Don MacAskill of SmugMug details how they process high-resolution photos and high-definition video use a cloud hosted queuing architecture in SkyNet Lives! (aka EC2 @ SmugMug).
SkyNet, as you might expect, operates completely without human minders and automatically scales up and down in relation to the work load. Their system has several components:
mechanisms for automatically configuring and running VMs.
Don shares a lot of practical detailia on how to efficiently use AWS, how their queue service works, and how their controller manages to balances minimizing cost while still being responsive to users. Achieving fairness and balance in a queue system can be difficult, but SmugMug appears to have done a good job of that.
What rocks about queuing architectures is that they are just so damn robust. Work is safe in the queues. A random reboot won't cause a loss. If one component is producing events too fast the queue will buffer up events until they can be processed. New components can be cleanly added and removed from the system at any time. Timing isn't critical. Work is processed when someone gets around to it. Timeouts and retries are unnecessary. Programs are simple loops that block on the queue, do something, persist results, and feed back more parallelizable work requests back into the queue. Very hard to screw up. Compare and contrast to complex multi-threaded system with shared-state.
Building GrepTheWeb in the CloudAmazon has published a great couple of articles on building a canonical Cloud Architecture: Building GrepTheWeb in the Cloud, Part 1: Cloud Architectures and Building GrepTheWeb in the Cloud, Part 2: Best Practices.
These are really tight and well written articles so I'll just hit certain high points. The example used is an application called GrepTheWeb. GrepTheWeb searches using a regular expression across millions of web documents. So it's a grep for the web, ah got it now. The idea is to take an unpredictable but possibly large number of search requests, apply the search expression to hundreds of terabytes of documents, and return the results in a reasonable period of time.
How exactly would you do such a thing? Here's how you do it in the cloud:
Clearly these are all (except for Hadoop) built on Amazon services, but the general ideas apply anywhere. For storing large amounts of data and accessing it efficiently in parallel you need a distributed file system like S3. To coordinate and dispatch work you need a queuing service like SQS. For keeping intermediate state you need a scalable database store like SimpleDB, though you could also imagine using S3. For dynamically scaling processing nodes something like EC2 is necessary. And for actually carrying out the document search a framework like Hadoop provides a lot of features, though you can imagine using other compute grid products.
Here's their fabulous picture of what the system looks like:
All the parts and linkages are described in the paper. What's important to note is that even though there are a lot of independently moving parts all the boundaries are clear and well described. In your typical program few will have any idea how it works. Using Cloud Architecture principles it's possible to create a system which both scales and easy to understand and explain.
The paper makes several key architectural recommendations:
All good stuff which is why I like this paper so much. There's a big conceptual shift here, especially of you are used to relatively simple client-server and N-tier systems. It's like simulating in your mind how to keep an army of ants all working independently while still communicating, coordinating, and making progress on a goal. We implemented similar architecture in datacenters long before the cloud, it was just a lot harder as everything was roll your own. The cloud makes all the necessary components standard, featureful, and relatively inexpensive. This opens any application to completley different ways of structuring their backends than they did in the past.