Paper: Memory Barriers: a Hardware View for Software Hackers

It's not often you get so enthusiastic a recommendation for a paper as Sergio Bossa gives Memory Barriers: a Hardware View for Software Hackers: If you only want to read one piece about CPUs architecture, cache coherency and memory barriers, make it this one.

It is a clear and well written article. It even has a quiz. What's it about?

So what possessed CPU designers to cause them to inflict memory barriers on poor unsuspecting SMP software designers?
In short, because reordering memory references allows much better performance, and so memory barriers are needed to force ordering in things like synchronization primitives whose correct operation depends on ordered memory references.
Getting a more detailed answer to this question requires a good understanding of how CPU caches work, and especially what is required to make caches really work well. The following sections:
  1. present the structure of a cache,
  2. describe how cache-coherency protocols ensure that CPUs agree on the value of each location in memory, and, finally,
  3. outline how store buffers and invalidate queues help caches and cache-coherency protocols achieve high performance.
We will see that memory barriers are a necessary evil that is required to enable good performance and scalability, an evil that stems from the fact that CPUs are orders of magnitude faster than are both the interconnects between them and the memory they are attempting to access.