SLA monitoring


We're running a enterprise SaaS solution that currently holds about 700 customers with up to 50.000 users per customer (growing quickly). Our customers have SLA agreements with us that contains guaranteed uptimes, response times and other performance counters. With an increasing number of customers and traffic we find it difficult to provide our customer with actual SLA data. We could set up external probes that monitors certain parts of the application, but this is time consuming with 700 customers (we do it today for our biggest clients). We can also extract data from web logs but they are now approaching about 30-40 GB a day.

What we really need is monitoring software that not only focuses on the internal performance counters but also lets us see the application from the customers viewpoint and allows us to aggregate data in different ways. Would the best approach be to develop a custom solution (for instance a distributed app that aggregates data from different logs every night and store them in a data warehouse) or are there products out there that are suitable for a high scalability environment?

Any input is greatly appreciated!