Yesterday, someone in the newsgroups asked a question about synchronization problems he was having. I explained that –if properly programmed– a consumer producer queue does not need synchronization even on a multiprocessor machine.
A Microsoft employee then asked me to explain exactly how this was done. Ie. To put my money where my mouth is.
A queue is a chunk of memory that can be divided into equal sized pieces. To simplify this you can think of a queue simply as a traditional 1 dimensional C style array.
You should also note that there is 1 reader, and 1 writer. Each will work in his own thread. This means that reads can happen concurrent with writes, but there will be no multiple concurrent reads or multiple concurent writes. It is possible to provide that level of concurrency without locks (I did, once) but that is significantly more complex, and not the purpose of this example.
To manage the producer and consumer part, we need to have 2 additional variables: a read index and a write index.
The read index is the index of the element right after the element that was last read by the consumer.
The write index is the index of the element right after the element that was last written by the producer.
A writer can always add elements in the queue as long as it can move the write pointer without overtaking the read pointer. An overtake would mean that the queue would overflow.
There are 3 possible cases:
- The read and write pointers are equal. This is only possible if the queue is in the initialized state. Any read will result in a fail, any write will succeed.
- The read pointer is pointing to the first element after the write pointer. This means that the queue is full. If one more element would be added, the read and write pointers would be the same, and the reader would assume that the queue was empty. Any read will succeed, and write will fail.
- The read and write pointers are not equal, and the previous case was not true. Any read will succeed. Any write will succeed.
bool PushElement(T &Element)
int nextElement = (m_Write + 1) % Size;
if(nextElement != m_Read)
m_Data[m_Write] = Element;
m_Write = nextElement;
As long as the first element after the write pointer is not the read pointer, it is not occupied. The element can be added and the write pointer can be updated.
Otherwise the queue is full and the write operation will fail.
bool PopElement(T &Element)
if(m_Read == m_Write)
int nextElement = (m_Read + 1) % Size;
Element = m_Data[m_Read];
m_Read = nextElement;
If the read pointer is equal to the write pointer, there is nothing to read. In any other case we can read the first element and update the read pointer.
Why this is safe
The advantage of this scheme is that no locking is needed. Each of the pointers (indices really) is updated only by one thread.
This means there is no write concurrency as far as the pointers are concerned.
Furthermore – and this is important – the read and write pointers are updated only after the element concerned is read or written.
This means that if the read or write pointer changes, the elements are guaranteed to reflect that situation. The gap between the element change and the pointer change is invisible to the other threads.
So logically, this code is perfectly safe. However, one final issue must be highlighted, and that is the declaration of the pointers and the data array.
volatile int m_Read;
volatile int m_Write;
static const int Size = 10;
volatile T m_Data[Size];
If we take no precautions, the optimizer can reorder the actual memory operations on the read and write pointer.
Or it could assume that variables do not change between calls and use cached values instead of what is actually in memory. This could be especially problematic on multi CPU machines where each CPU has its own cache.
See the documentation for ‘volatile’ in MSDN for a more detailed explanation.
How this could be unsafe
There is one thing that is critical in this code, and that is that memory updates are not re-ordered. You can easily understand that if the write pointer gets updated before the actual data element is updated, there is a gap that creates a race condition.
Therefore it is critical that the platform this code runs on (CPU, chipset, memory controller) does not re-order the memory write operations. Otherwise you’d have this code that is logically thread-safe, but thread-unsafe due to CPU peculiarities. You’d have all sorts of impossible to debug problems.
The design as it is works reasonably well, but here are a couple of tips to make this into powerful code:
- Foresee an array of read pointers instead of using only one. That way you could have a broadcast queue in which each consumer has the opportunity to receive elements. You have to check each of the read pointers before writing to make sure that there is room enough in the queue.
- Allow the constructor to take a variable sized piece of memory that was previously constructed. Make the read and write pointers the first items in that memory. This allows you to put a queue on shared memory and communicate with other processes.
Making a multi thread safe consumer producer queue without synchronization is perfectly possible. If you want to play with the code, it is attached to this article and licensed under the MIT license.
I whipped this code together in a very short time for the purpose of this demo. It should be bug free, but if you think there is something specifically wrong, let me know and I’ll see if there is a problem with it.