Paper: Revisiting Network I/O APIs: The netmap Framework

Here's a really good article in the Communications of the ACM on reducing network packet processing overhead by redesigning the network stack: Revisiting Network I/O APIs: The Netmap Framework by Luigi Rizzo. As commodity networking performance increases operating systems need to keep up or all those CPUs will go to waste. How do they make this happen?

Abstract:

Today 10-gigabit interfaces are used more and more in datacenters and servers. On these links, packets flow as fast as one every 67.2 nanoseconds, yet modern operating systems can take 10-20 times longer just to move one packet between the wire and the application. We can do much better, not with more powerful hardware but by revising architectural decisions made long ago regarding the design of device drivers and network stacks.
The netmap framework is a promising step in this direction. Thanks to a careful design and the engineering of a new packet I/O API, netmap eliminates much unnecessary overhead and moves traffic up to 40 times faster than existing operating systems. Most importantly, netmap is largely compatible with existing applications, so it can be incrementally deployed.
Network I/O has two main cost components. The per-byte cost comes from data manipulation (copying, checksum computation, encryption) and is proportional to the amount of traffic processed. The per-packet cost comes from the manipulation of descriptors (allocation and destruction, metadata management) and the execution of system calls, interrupts, and device-driver functions. Per-packet cost depends on how the data stream is split into packets: the larger the packet, the smaller the component.
To get the idea of the speed constraints, consider a 10-Gbit/s Ethernet interface, which will be the point of reference throughout this article. The minimum packet size is 64 bytes or 512 bits, surrounded by an additional 160 bits of inter-packet gap and preambles. At 10 Gbit/s, this translates into one packet every 67.2 nanoseconds, for a worst-case rate of 14.88 Mpps (million packets per second). At the maximum Ethernet frame size (1,518 bytes plus framing), the transmission time becomes 1.23 microseconds, for a frame rate of about 812 Kpps. This is about 20 times lower than the peak rate, but still quite challenging, and it is a regimen that needs to be sustained if TCP is to saturate a 10-Gbit/s link.