Detection of the VC2005 compiler version at compile time: RTM or SP1

I was hanging out in dotnet.languages.vc yesterday when I read an interesting question. Someone wanted to know how he could make a compile-time decision, based on whether a piece of code was compiled with VC2005 RTM or VC2005 SP1.


I told him to simply use the _MSC_VER macro that should tell you which version you are compiling with, and I asked him why he would need to know, since generally you shouldn’t have to care.


It turned out that the _MSC_VER macro reports the same major and minor version for the RTM and SP1 version. Tough luck. This is a bug IMO, but one that is there forever. Changing the macro in the next release won’t fix this one J


But the person asking the question really needed to know:


<quote>


Microsoft has made some changes to the CxxUnhandledExceptionFilter logic in
Visual C++ 2005 SP1. Because of the way that my code (in a dll) is linked
to executable files compiled with Visual C++ 6 (and that I cant touch),
this change has broken my functionality that I have for dumping debug
information in the event of a crash. I have a workaround that makes it work
again with VS2005 SP1 but that only works on VS2005 SP1 and not on VS2005.
Hence the need to detect the difference :)


</quote>


I searched around for a bit, but could not find any simple macros that could make a difference.


_CRT_ASSEMBLY_VERSION could be used because the CRT and compiler are distributed together, but it is not guaranteed to be correct in all scenarios. And even then, that macro is a string and not usable with preprocessor commands.


So there was no direct way to make the distinction.


However, SP1 fixed a ton of bugs, so if I could find anything that was fixed and detectable at compile-time, I had a solution.


I browsed around in the list of fixed bugs


http://blogs.msdn.com/vcblog/archive/2006/06/22/643325.aspx


And picked out this one:


http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=101702


Basically, it is a fix to make the sizeof() operator conformant to the C++ standard when used with a reference of an array. sizeof is evaluated at compile time, so that should do the trick.


Unfortunately, the result of a sizeof operation cannot be used to control the preprocessor, so my first solution was a runtime check that was fixed at compile-time.


Solution 1


  char test[10];


  if (sizeof(&test)==4)


    cout << “SP1″;


  else


    cout << “RTM”;


This is a simple solution, but the decision is still made at runtime, and both code paths exist inside the compiled binary.


I posted this solution, still not 100% satisfied, but any solution is better than no solution of course.


Solution 2


Ben Voigt looked at that solution and pointed out that the result of a sizeof() operation can be used to specialize a template.


Wanting to go all the way on this one, I created the following solution:


This template class represents the normal case of behavior up to and including VC2005 RTM. DoStuff is the method that will be called to invoke this version dependent code.


template <int i> class VcSelectorClass


{


public:


  static char Dummy[10];


  static void DoStuff(void)


  {


    //replace with own implementation


    cout << “RTM” << endl;


  }


};


 


This specialization represents the case of behavior of VC2005 SP1 and higher. The specialized case is for the integer 4. This is the size that should be reported by sizeof when a reference to an array is passed. All versions before SP1 report the actual array size. All version after and including SP1 report 4.


Of course, in a 64 bit world, this would be 8, but the original problem has to do with compilation for use with VC6 executables. These can only be 32 bit, so no attention has to be given to this detail for now.


template <> class VcSelectorClass<4>


{


public:


  static void DoStuff(void)


  {


    //replace with own implementation


    cout << “SP1″ << endl;


  }


};


 


I don’t want the user to supply his own sizeof calculation because a) it is easy to make a mistake, and b) it is butt-ugly. The typedef solves this very neatly.


To alleviate the need for the user to declare his own array somewhere, one is declared in the generic template.


typedef VcSelectorClass<sizeof(&VcSelectorClass<0>::Dummy)> VcSelector;


 


int _tmain(int argc, _TCHAR* argv[])


{


  //the template is specialized at compile time, so only 1


  //codepath is included in the compiled code.


  VcSelector::DoStuff();


  return 0;


}


Invoking the behavior is pretty elegant, and the templates are specialized at compile-time so only 1 code path is included in the compiled code.


You might wonder why I used template classes instead of template functions.


The reason is simple. At this moment, specialization of template functions is not part of the C++ standard. I have read reports that it is being considered, but for now my only choice was to use template classes. Unfortunately this makes the whole thing a little more verbose.


Solution 3


By now the original problem was solved good and well, but I felt the need to geek out and create something that I would put in production code without fear that it would come back to haunt me.


After all, the current solution would only work if the behavior has a simple difference between pre SP1 and post SP1.


Older versions might even need a different implementation, and post SP1 versions might not be backwards compatible.


And 64 bit compiles would also need to have a defined behavior, in case the VC6 requirement is not valid anymore.


For clarity I removed some of the code comments.


#ifdef _WIN64


//no 64 bit implementation yet.


#pragma message (“Warning: DoVcVerDepStuff has no 64 bit implementation yet”)


#else


#if _MSC_VER < 1400


//implementation for VC2003 and earlier


void DoVcVerDepStuff(void)


{


  //implementation here


}


#elif _MSC_VER == 1400


template <int i> class VcSelectorClass


{


public:


  static char Dummy[10];


  static void DoStuff(void)


  {


    //RTM implementation


  }


};


 


template <> class VcSelectorClass<4>


{


public:


  static void DoStuff(void)


  {


    //SP1 implementation here


  }


};


 


typedef VcSelectorClass<sizeof(&VcSelectorClass<0>::Dummy)> VcSelector;


#define DoVcVerDepStuff VcSelector::DoStuff


 


#else


//there is no post SP1 implementation yet


#pragma message (“Warning: DoVcVerDepStuff has no post VC2005 implementation yet”)


#endif


 


#endif //_WIN64


 


int _tmain(int argc, _TCHAR* argv[])


{


  DoVcVerDepStuff();


      return 0;


}


Conclusion


The original problem was solved neatly. Demo projects of the 3 solutions are available for download under the MIT license.


If anything, this proves that you should always update the version numbers for anything that get updated even 1 bit.


In this case, the problem could just as easily have been with the CRT instead of the compiler itself I don’t know enough of the original problem to judge that, but in this case the compiler version would have sufficed.


It is worth to note that I cannot possibly anticipate the behavior of another service pack for VC2005. If the version macro is updated, the programmer will be warned. If it isn’t, there is no other option than to look for another fixed bug and use that. But that cannot be accounted for at this moment. It will be the responsibility of the programmer to see if a new check is needed.

Creating a thread safe producer consumer queue in C++ without using locks

Yesterday, someone in the newsgroups asked a question about synchronization problems he was having. I explained that –if properly programmed– a consumer producer queue does not need synchronization even on a multiprocessor machine.


A Microsoft employee then asked me to explain exactly how this was done. Ie. To put my money where my mouth is.


The idea


A queue is a chunk of memory that can be divided into equal sized pieces. To simplify this you can think of a queue simply as a traditional 1 dimensional C style array.


You should also note that there is 1 reader, and 1 writer. Each will work in his own thread. This means that reads can happen concurrent with writes, but there will be no multiple concurrent reads or multiple concurent writes. It is possible to provide that level of concurrency without locks (I did, once) but that is significantly more complex, and not the purpose of this example.


To manage the producer and consumer part, we need to have 2 additional variables: a read index and a write index.


The read index is the index of the element right after the element that was last read by the consumer.


The write index is the index of the element right after the element that was last written by the producer.


A writer can always add elements in the queue as long as it can move the write pointer without overtaking the read pointer. An overtake would mean that the queue would overflow.


There are 3 possible cases:


  1. The read and write pointers are equal. This is only possible if the queue is in the initialized state. Any read will result in a fail, any write will succeed.
  2. The read pointer is pointing to the first element after the write pointer. This means that the queue is full. If one more element would be added, the read and write pointers would be the same, and the reader would assume that the queue was empty. Any read will succeed, and write will fail.
  3. The read and write pointers are not equal, and the previous case was not true. Any read will succeed. Any write will succeed.

The implementation


Adding elements


 


      bool PushElement(T &Element)


      {


            int nextElement = (m_Write + 1) % Size;


            if(nextElement != m_Read)


            {


                  m_Data[m_Write] = Element;


                  m_Write = nextElement;


                  return true;


            }


            else


                  return false;


      }


As long as the first element after the write pointer is not the read pointer, it is not occupied. The element can be added and the write pointer can be updated.


Otherwise the queue is full and the write operation will fail.


Removing elements


 


      bool PopElement(T &Element)


      {


            if(m_Read == m_Write)


                  return false;


 


            int nextElement = (m_Read + 1) % Size;


            Element = m_Data[m_Read];


            m_Read = nextElement;


            return true;


      }


If the read pointer is equal to the write pointer, there is nothing to read. In any other case we can read the first element and update the read pointer.


Why this is safe


The advantage of this scheme is that no locking is needed. Each of the pointers (indices really) is updated only by one thread.


This means there is no write concurrency as far as the pointers are concerned.


Furthermore – and this is important – the read and write pointers are updated only after the element concerned is read or written.


This means that if the read or write pointer changes, the elements are guaranteed to reflect that situation. The gap between the element change and the pointer change is invisible to the other threads.


So logically, this code is perfectly safe. However, one final issue must be highlighted, and that is the declaration of the pointers and the data array.


      volatile int m_Read;


      volatile int m_Write;


      static const int Size = 10;


      volatile T m_Data[Size];


If we take no precautions, the optimizer can reorder the actual memory operations on the read and write pointer.


Or it could assume that variables do not change between calls and use cached values instead of what is actually in memory. This could be especially problematic on multi CPU machines where each CPU has its own cache.


See the documentation for ‘volatile’ in MSDN for a more detailed explanation.


How this could be unsafe


There is one thing that is critical in this code, and that is that memory updates are not re-ordered. You can easily understand that if the write pointer gets updated before the actual data element is updated, there is a gap that creates a race condition.


Therefore it is critical that the platform this code runs on (CPU, chipset, memory controller) does not re-order the memory write operations. Otherwise you’d have this code that is logically thread-safe, but thread-unsafe due to CPU peculiarities. You’d have all sorts of impossible to debug problems.


Extending functionality


The design as it is works reasonably well, but here are a couple of tips to make this into powerful code:


  • Foresee an array of read pointers instead of using only one. That way you could have a broadcast queue in which each consumer has the opportunity to receive elements. You have to check each of the read pointers before writing to make sure that there is room enough in the queue.
  • Allow the constructor to take a variable sized piece of memory that was previously constructed. Make the read and write pointers the first items in that memory. This allows you to put a queue on shared memory and communicate with other processes.

Conclusion


Making a multi thread safe consumer producer queue without synchronization is perfectly possible. If you want to play with the code, it is attached to this article and licensed under the MIT license.


I whipped this code together in a very short time for the purpose of this demo. It should be bug free, but if you think there is something specifically wrong, let me know and I’ll see if there is a problem with it.

Source code for the user mode part of my USB tutorial driver

Sometime ago I wrote 2 length articles on writing device drivers, using the new Kernel Mode Driver Framework that is the bees-knees when it comes to kernel mode programming.


Of course, I had to have a demo application to interact with the device drivers to control and monitor the USB device through the driver. I whipped together an MFC dialog application and a user mode API.


Since those were not the main point of my articles, I took some shortcuts to get working demo ASAP. Within the context of my demo, these shortcuts are safe enough. However, they are not an example of how to write user mode APIs.


In fact, they might encourage bad programming style if you would not understand that some things are only valid in a particular context. That is why I have not released the source code of the application and the API.


The problem is that every month, I get at least one and sometimes more requests for the code because there are very few examples on how to interact with device drivers.


I still don’t want to release the code because it is not an example of good programming.


I am writing a series of articles on how to write user mode APIs, but that is far from finished. That is why I extracted that portion of the code, since it is clean enough. It is available for download under the MIT license.


If there is anything you can learn for this, is that you should not take shortcuts if you create tutorial articles. Even it some piece of code is only there for support purposes, some people will need it to understand the whole picture.


Now then, when you look at those sources, you’ll see that all the win32 functions have a __ prefix. Those functions are wrappers around the true win32 function that throw exceptions instead of returning an error status. I have not included those wrappers because they contain macro magic that made my life easier, but is not how I would program production code.


This means that you cannot directly compile it, but it should be easy enough to adapt it and use it for interacting with my demo USB driver.


Good luck, and let me know if you have specific questions.