Colmux - Finding Memory Leaks, High I/O Wait Times, and Hotness on 3000 Node Clusters
Todd had originally posted an entry on collectl here at Collectl - Performance Data Collector. Collectl collects real-time data from a large number of subsystems like buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp, all using one tool and in one consistent format.