« Google AppEngine - A Second Look | Main | Heavy upload server scalability »

GIS Application Hosting

Share the experience of hosting highly scalable/reliable GIS based application which involves Map Server, Spatially enabled database, j2ee, Routing Applications etc.

Reader Comments (1)

IMS (I'm assuming that's what we're talking about) can be difficult to scale, depending on your implementation. The no-brainer is using a tiled map (e.g. Google maps). This is pretty much the norm and any IMS worth anything supports tile-based maps. It gives you the ability to cache these tiles and dump them out from cache on request. Using tile maps also give you some natural parallelism as the browser is able to request multiple tile images simultaneously. It's still extremely processor-intensive and because of the nature of the beast, the cache hit rate is not so great.

If you need to change the layer setup of your map on-the-fly, render the layers into groups and then combine them together on the server-side. While you can do this on the client, the user experience will degrade due to the latency of doing all these various tile requests over the Internet. Create a front-end that will request the required tiles from rendering backends and cache these, then composite the final product together and present 1 image to the client. Log all of these pieces with delivery times so you can easily spot bottlenecks. Keep logs in production and run reports to find slow requests. Of course, if you have the right combination of factors, you can pre-render all the tiles. This will scale like mad, but this can be billions of tiles if you're talking about high-detail data. This can be premature optimization if you only ever render a few million of them before updating your data. You should find your slowest tiles and cache those though. These will be low-scale, high-density areas and high-scale tiles that require loading hundreds of megabytes of data to render.

For scalable caching you're almost on your own. There are systems like TileCache, but it won't scale out and doesn't have an efficient mechanism for flushing dirty tiles in a complex layer scenario. When it comes to caching, IMS is particularly unique because each tile is so computationally expensive and hit rates are relatively low. This means large, disk-based caches are definitely your best bet. The OS disk cache itself will suffice for tiles with high hit rates. You'll want to find a way to flush dirty tiles (for data updates) efficiently without flushing the entire cache. If you have to flush the entire cache for a single data source update, it will cost you a lot of compute time. Deleting that much data on disk can be crushing I/O-wise too.

Other than that, traditional GIS optimization such as keeping data sizes minimized and layers as simple as feasible applies here. Whatever mapping system you use will have it's own performance quirks. Some have poor disk access patterns that you'll have to optimize around. UMN MapServer supports tile indexing for chunking data and dispersing it among more files for partitioned access. Some IMS systems benefit from "Metatiling" where it renders, say, a 9x9 tile mosaic at once instead of a single tile, saving a bunch of precious I/Os. Monitor disk activity as granularly as possible and see which data files are getting hit the most. Decompose your spatial data and/or slice it up into pieces. This will give more granular index hits. It's generally best to use local disk files for small to medium data sources that must be scanned and shown in larger scales. For large and very large data sources that are only shown at smaller scale (e.g. high-detail roads), use a spatial database.

For your spatial database, the indexes can get very large. Watch your index sizes as these can eclipse the size of the table itself. Strip out all the non-essential data so you can get as much packed into memory as possible. Realistically, if you can't keep the entire index in low-latency storage (memory or SSD), it's going to be very slow. For read-only data, use MySQL MyISAM and compress the tables. If you can, order the data so it's spatially close. Going further, specifically with road data, distill the attributes you use for styling into a single style column and eliminate all the other attribute data you don't need. Delete what you aren't going to show. Join the lines together that match styles to reduce your index size. Make sure you aren't creating extremely long lines (an Interstate can run together for a very large area, for instance) and sending too much between the DB and IMS at request time. Unmodified, each tile of Manhattan NY had 10-20k rows of data coming back from the DB. Not good. Often a single spatial database server that's able to pack all of the needed data in RAM at once can feed a dozen or more map servers. IMS really needs disk caching so don't put your spatial DB and map server on the same box unless you've got RAM coming out the wazoo.

You'll find that IMS will always be a battle between quality and performance. You could have great performance with low-quality geometries and fast, ugly rendering. Just find that sweet spot and you'll be golden. I highly recommend using a cloud environment where you can scale up and down at will. As processor intensive as it is, you'll need all the juice you can get. Two dozen simultaneous users can beat the snot out of even the most well-engineered single-server IMS setup, so you need to be able to scale very dynamically.

November 29, 1990 | Unregistered CommenterRick Branson

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>