Paper: MegaPipe: A New Programming Interface for Scalable Network I/O
The paper MegaPipe: A New Programming Interface for Scalable Network I/O (video, slides) hits the common theme that if you want to go faster you need a better car design, not just a better driver. So that's why the authors started with a clean-slate and designed a network API from the ground up with support for concurrent I/O, a requirement for achieving high performance while scaling to large numbers of connections per thread, multiple cores, etc. What they created is MegaPipe, "a new network programming API for message-oriented workloads to avoid the performance issues of BSD Socket API."
The result: MegaPipe outperforms baseline Linux between 29% (for long connections) and 582% (for short connections). MegaPipe improves the performance of a modified version of memcached between 15% and 320%. For a workload based on real-world HTTP traces, MegaPipe boosts the throughput of nginx by 75%.
What's this most excellent and interesting paper about?
partitioninglightweight socket (lwsocket)batching
Performance with Small Messages:
Small messages result in greater relative network I/O overhead in comparison to larger messages. In fact, the per-message overhead remains roughly constant and thus, independent of message size; in comparison with a 64 B message, a 1 KiB message adds only about 2% overhead due to the copying between user and kernel on our system, despite the large size difference.
Partitioned listening sockets:
Instead of a single listening socket shared across cores, MegaPipe allows applications to clone a listening socket and partition its associated queue across cores. Such partitioning improves performance with multiple cores while giving applications control over their use of parallelism.
Lightweight sockets:
Sockets are represented by file descriptors and hence inherit some unnecessary filerelated overheads. MegaPipe instead introduces lwsocket, a lightweight socket abstraction that is not wrapped in filerelated data structures and thus is free from system-wide synchronization.
System Call Batching:
MegaPipe amortizes system call overheads by batching asynchronous I/O requests and completion notifications within a channel.