Practical ATL: recognizing marshalling problems

Before I publish my next article, I should explain something else about the black magic aspect of ATL / COM: Marshalling.


The Stuff server we made in the previous article is an out-of-proc server. This means that the arguments being passed to / from a stuff instance have to be somehow moved back and forth between different process contexts (or different threading apartments).


In the case of IStuff, we don’t really have to bother, because IStuff derives from IDispatch. And in that case, there is a marshaller (aptly named Universal Marshaller) that can be used to do all this for you.


If IStuff wouldn’t derive from IDispatch, then we would need to provide our own proxy – stub interface that knows how to marshal the interface. With ATL, this is not difficult. You’ll notice that the Stuff solution of my previous article had a ‘Stuff_ServerPS’ project that I didn’t mention before.


That solution will build the proxy stub dll that will perform the marshaling. It is disabled by default, so you have to enable it in the build configuration before it is built. Also remember to rebuild the proxy, every time the IDL files of your project change.


So for the remainder of this article I will show what can happen if there is a marshalling problem. Don’t worry if it all seems ugly, gory, and incomprehensible. Normally you won’t have to deal with issues like this. But if you should ever do, you should at least recognize the signs that you have a marshaling problem.


Confused? It’s the marshaller


If you find yourself in a situation where very weird things happen, suspect a marshalling problem.


If you get the error code E_NOINTERFACE when you know for sure that your object implements it properly, suspect a marshalling problem.


If you get a crash while using an interface pointer that was properly created without getting a HRESULT that indicated a problem,… that’s right: you can bet it’s a marshaling problem.


Example


The article I am currently writing discusses implementing a non standard class object, for the purpose of parameterized construction of the Stuff object. Never mind for the moment about what any of that means.


For all practical purposes: it just means that I defined the following interface via copy / paste / edit:


[


      object,
      local,
      uuid(6DD69CDB-3128-432b-B335-773A287E6F06),
      dual,
      helpstring(“IStuffCreator Interface”),
      pointer_default(unique)
]
interface IStuffCreator : IUnknown {
  [id(1), helpstring(“method MakeMeAStuff”)]
    HRESULT MakeMeAStuff(
      [in] BSTR name,
      [in] REFIID riid,
      [out] void ** ppStuff);
};


There were 4 things wrong with this interface (showing just why you shouldn’t define IDL interfaces in a hurry). This has to be some sort of record.


  1. ‘local’ shouldn’t be there.
  2. ‘dual’ shouldn’t be there.
  3. ‘void’ should be ‘IUnknown’
  4. ‘ppStuff’ should be attributed with riid_is(riid).

So let’s have a look at the result of these different issues.


Issue 1: local


‘local’ simply shouldn’t be there. ‘local’ is a MIDL attribute that –when use in an interface header- causes MIDL not to generate stubs for that interface.


The effect on our interface is that the IStuff specific methods are not forwarded to the remote server.


You might wonder: why does it exist. The answer is: in case you want to write custom marshalling code, or in case the calls never have to be remoted. In that case you can have a local interface and a remote interface. And usage and marshalling could be optimized.


IUnknown is such an interface. But it has custom stub code to deal with things like remoting


The result of using ‘local’ on my IStuffCreator interface is that trying to call GetEnum on the returned IStuff interface pointer will result in a debug assertion: Run-Time Check Failure #0 – The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.


And if you have this problem with COM, it is usually indeed an indication that you have a calling convention mismatch. In this case, it is a side effect of the fact that we are trying to execute a method that isn’t remoted.


Issue 2: dual


This one shouldn’t be here either. Dual indicates that the interface implements IDispatch, and this could have an effect on marshalling.


In this case, it worked just fine. Perhaps it is because of the fact that I use my object as a class object, or perhaps because my interface itself uses only arguments that are automation compatible, or perhaps because the stars are aligned in the proper configuration.


I don’t know. And anything you don’t know for certain will hurt you when dealing with DCOM. It has to be correct.


Issue 3: void


This one is interesting.


If I want to pass an interface pointer, it has to be passed as an IUnknown**, instead of a void**. This is because Universal Marshaller has no idea on how to marshal something which has only contextual meaning, nor does the IDL compiler know what you mean with it.


With this I mean that my server knows it will be an interface pointer, and my client will know it is an interface pointer, but the stuff in the middle doesn’t. It sees just a memory address.


I used void** because that is what QueryInterface itself uses. And what’s good enough for QI should be good enough for me. Except QI has custom code for marshalling its arguments across, which I don’t. So QI can use void** because both the proxy and the stub know what is going on.


So what happens when you use void**?


I request the class object like this:


    hr = CoGetClassObject(
           __uuidof(Stuff),


           CLSCTX_ALL,


           NULL,


           __uuidof(IStuffCreator),


           (void**)&stuffClass);


 


And the resulting hr is 0x80020008: Bad variable type. There is absolutely nothing wrong with this line of code. But because there is something in the interface itself that confuses the marshaller, it tells me to sod off when I want that specific interface.


Credit where it is due: I figured this one out thanks to someone whose name I can’t figure out J. But his blog post is here.


Issue 4: iid_is(riid)


This one is also a good one.


The iid_is informs the marshaller that riid points to the identifier which identifies the interface that is being marshaled.


This way, the marshaller knows how to marshal the interface that was specified with riid. Without that part, it would only marshal the IUnknown part of the interface, and not the IStuff specific part.


As a result, you can use the IUnknown part of the interface without a problem, but if you use the IStuff specific part, you will get an access violation, stack corruption or other interesting phenomenon.


Conclusion


A couple of things can be concluded at this point.


First of all: Marshaling problems are hard to debug, because the problems occur outside of your source code.


If you use any interfaces that don’t derive from IDispatch, then you have to enable the ‘xxxPS’ project in your solution so that the ProxyStub dll will be built and registered. And of course, this dll has to be part of your install procedure. Without this stub, you could get access violations, or you get an E_NOINTERFACE error if the marshaler doesn’t know one of those interface exist.


As I have shown, you need to be careful and pay attention when you manually define COM interfaces. If you don’t, then you can have all sorts of hard to diagnose problems.


And if you ever find yourself in a situation where nothing makes sense, then suspect marshalling issues and have a long hard look at your IDL files.


 

File a bug report and make Visual Studio a better product

If you find a bug in Visual Studio, report it on http://connect.microsoft.com If the bug affects the version of VS that is in development at that time, there is a significant chance it will get fixed before the next release. Just this week I got confirmation that all 3 bugs that I filed in the last months have been resolved.


The first one is a nasty one. If you add an ATL project to an existing, non-empty solution, the ATL wizards will either hang the IDE or fail to work when adding an ATL object. I found a workaround for it, but it involved hand editing the solution file.


The second one is a small bug in the TR1 regex parser. No workaround, other than to restructure the regex.


And the last one was an error in the MSDN documentation about open modes with the fopen family, which counts as a bug because it triggered runtime errors.


All 3 have been fixed. I especially appreciate the first one. I am working on ATL projects, and I develop my COM server in the same solution as a couple of related non-ATL projects. So every time I work with ATL projects, I can have a warm and fuzzy feeling that because of me, VS is now a better product. :)

Practical ATL: Implementing an enumerator object

This is my first article about practical ATL examples, which I already mentioned here.


In this example I will create an ATL Server which implements 1 custom interface: IStuff. Hardly an inspiring name, but I couldn’t think of anything better before I had my first coffee.


This interface will have 1 method which will return an enumerator to the caller. Specifically, this enumerator will implement IEnumString.


Enumerators are part of the interface of many real-life COM servers, but creating your own is an arduous task. ATL makes this much easier, but an example makes things so much easier still.


Since client code will have to deal with this enumerator (and to have a demo app) I will also create a client application that creates an instance of the server, consumes the IStuff interface and then enumerate the items contained in the returned enumerator.


Before we get started


If you don’t own a copy of ‘ATL Internals, Second Edition’ yet, I strongly advise you to buy it. It is worth the money, and apart from leading you through all the interesting and gory stuff, it also starts by explaining the actual process of creating an ATL server through the IDE, as well as implementing methods, properties and events.


I am only going to cover that aspect very lightly. It should be detailed enough for anyone who has done C++ development using VS:


  1. Create a new ATL Server project with project name ‘Stuff_Server’ and a solution name ‘Stuff’.
  2. Choose the EXE project type and click ‘finish’
  3. Add a new class to the project, and specify ATL Simple Object. Use ‘Stuff’ as the name of the object.
  4. Switch to classview, and add a new method to the IStuff interface with the prototype ‘HRESULT GetEnum(IUnknown ** ppEnum).  ppEnum should be marked as an ‘out’ parameter.

Build the project, and you should be ready for the actual work. The ATL wizards will have implemented all the IDL and DCOM stuff for you, and you only have to implement the functionality.


Finally, I always end by changing the project configuration to use static CRT and ATL runtimes instead of dynamic runtimes. This makes it infinitely more easy to distribute the resulting executables without having to bother with registration and installation of the runtime dlls and their manifests.


Creating the server


Now that the ATL wizards have generated the function skeleton for us, it is time to do something with it:


STDMETHODIMP CStuff::GetEnum(IUnknown** ppEnum)
{
 
}


The enumerator


Before we can do anything else, we have to think about the actual enumerator for a bit. In our case we implement IEnumString. This is an interface for enumerating string (who’d have thought, eh?).


Enum interfaces all look alike, and ATL helpfully has a stock implementation for generic enumerators that allows you to specify the Interface, GUID, datatype and data type copy traits as template parameters. In order to facilitate coding, we make a typedef for this class:


typedef CComEnum<


                 IEnumString,


                 &IID_IEnumString,


                 LPOLESTR,


                 _Copy<LPOLESTR> > CComEnumStringAbstract;


 


This will specialize the enumerator for our purposes. The use of the first 3 parameters is obvious. Don’t worry about the _Copy parameter. I will cover that later.


The enumerator itself is a normal COM object, so it needs lifetime management, interface managements, and all other things you really don’t want to care about for something so simple. At this point, CComEnumStringAbstract will still be an abstract class because it can’t decide how to do these things for you.


In order to make our enumerator a full fledged COM object, we have to shove it into a bare COM object. This is very easy:


typedef CComObject<CComEnumStringAbstract> CComEnumString;


And that is all there is to creating an enumerator! Anyone who has ever implemented one manually will agree that this is infinitely more convenient


Implementing GetEnum


The implementation for GetEnum is also fairly easy (comments removed for clarity)


STDMETHODIMP CStuff::GetEnum(IUnknown** ppEnum)
{
  CComEnumString * thEnumObj  = NULL;
  HRESULT hr = CComEnumString::CreateInstance(&thEnumObj);
  if(FAILED(hr))
    return hr;

  CComPtr<CComEnumString> thEnum(thEnumObj);

  LPOLESTR strings[] = {L“One”, L“Two”, L“Three”, NULL};
  hr = thEnum->Init(&strings[0], &strings[3], NULL, AtlFlagCopy);
  if(FAILED(hr))
    return hr;
 
  hr = thEnum->QueryInterface( __uuidof(IUnknown), (void**)ppEnum);
  return hr;
}


First we create a new CComEnumString object. It is worth noting that the object returned by CreateInstance has a refcount of 0. Before we do anything with it we should AddRef it, but instead of doing that manually, I do it via assignment to the CComPtr. smart pointer. That way the object also gets released automatically at the end.


Then we have to make sure the enumerator has some data to enumerate. We do this by populating an array, and then telling the enumerator to copy the data out of it into its own internal buffer. This is done via the AtlFlagCopy flag.


It might seem weird if you come from a C background, but the ‘End’ pointer has to point to the first position beyond the last element. This is normal with collections in C++. That is also why I populate the array with a NULL pointer at the end; So that there is something in the array after the last ‘real’ element. This way there is no buffer overflow.


Using AtlFlagCopy has the overhead of having to copy the data, but otoh the enumerator does not have to care about the lifetime of the array. We could also use new[] to create a new array and then use the AtlFlagTakeOwnership flag to indicate that it is up to the enumerator to call delete[] and clean up. But in this example we can’t do that (read below).


When the enumerator is initialized, we return an IUnknown interface pointer to the caller. Since this will have increased the enumerator ref count, the enumerator object will keep on living when the CComPtr goes out of scope and releases its own reference to the enumerator.


There. That is all there is to it on the server side. No vast amounts of copy pasted boilerplate code, but just a couple of lines of powerful ATL incantations J


_Copy<LPOLESTR> revisited


If an enumerator has to copy data to a client, or if it has to copy data into its internal buffers in the ‘Init’ function, it has to know how to do this.


It’s all good and well to do a simple memcpy of an ULONG or a double, but for strings (LPOLESTR) and interface pointers (IUnknown) this won’t work. There’s ref counting and memory ownership to deal with.


Of course, the template class, not knowing about this, can’t anticipate what you want, so instead it needs a policy class to delegate to. That is where the _Copy<> template class comes in. It has 3 methods: Init, Destroy, and Copy. Those methods perform the actual operations needed to copy an instance of T to another instance of T, to initialize a new one, and to destroy one.


There are several stock specializations for _Copy<>, one of which works for LPOLESTR. So rather than copying the LPOLESTR pointer itself, it calls CoTaskMemAlloc to create a buffer for the copy of the string, and then copies over the data.


If you look at my example more closely, you will also see why I cannot simply new up an array of string literals and pass those to the Init function with the AtlFlagTakeOwnership flag.


LPOLESTR is superficially identical to wchar_t*. So using a string literal instead of an LPOLESTR (which is what I do in my example) will work without a hitch. But if I’d new up an array and tell the enumerator to manage it, that is when it would all go pear shaped.


Because the enumerator would not only call delete[] on the array, but it would also call _Copy::Destroy for each of the items. And trying to free up a string literal with CoTaskMemFree will result in a hard to diagnose crash.


Installing the server


On a development station this is done automatically by Visual Studio. But if you need to do this on another PC, you just execute the server from the command line with the /RegServer argument.


Creating the Client


Creating the client application is fairly simple, but I wanted to show how to do it anyway, because then I could demonstrate that the COM server works as intended.


I create a simple win32 console application so that I only had to care about the actual functionality.


Importing the server type library


In the olden days or yore, working with COM servers in a C++ project was a bit tedious. You had to include the IDL files in your project and compile them, then include the resulting header files, add the .c files to your project for the GUID definitions… Not rocket science by any means, but still… a bit messy and ugly.


Now, I have very few kind words for Visual Basic, but to give credit where it is due: Visual Basic is (in large part) responsible for the existence of tlb files.


Visual Basic programmers wanted to use COM too, and the people who made Visual Basic wanted to prevent the application programmers to have anything to do with IDL. They wanted Visual Basic to take care of all that mess behind the scenes.


And thus the type library (tlb) was born.


The type library contains the same stuff as the IDL files, but in a machine readable, programming language independent format.


To use a COM server, you need to do nothing more than add the following line to the StdAfx.h header file of the client program:


#import “Stuff_Server.tlb” no_namespace


You also need to make sure that the compiler knows where to find Stuff_Server.tlb. This is easily done by adding the following line to the ‘Additional include directories’ setting:


..\Stuff_Server\$(ConfigurationName)


If you then compile StdAfx.cpp, the compiler will pull in the type library and generate the appropriate headers which are then automatically included.


The no_namespace attribute tells the compiler that it shouldn’t put the declarations in a specific namespace.


Using the COM server


The client application code itself is fairly trivial:


int _tmain(int argc, _TCHAR* argv[])


{


  HRESULT hr = CoInitialize(NULL);


  if(SUCCEEDED(hr))


  {


    CComPtr<IStuff> stuff;


    hr = CoCreateInstance(


           __uuidof(Stuff), NULL, CLSCTX_ALL,


           __uuidof(IStuff), (void**)&stuff );


    if(FAILED(hr))


      return hr;


 


    CComPtr<IUnknown> thEnumUnk;


    hr = stuff->GetEnum((IUnknown **)&thEnumUnk);


    if(FAILED(hr))


      return hr;


 


    CComPtr<IEnumString> thEnum;


    thEnumUnk->QueryInterface(__uuidof(IEnumString), (void**) &thEnum);


 


    do


    {


      LPOLESTR str;


      hr = thEnum->Next(1, &str, NULL);


      if(hr == S_OK )


        wprintf(L“%s\n”, str);


      CoTaskMemFree(str);


    }while (hr == S_OK );


  }


 


  CoUninitialize();


  return 0;


}


First I initialize the COM runtime and then create a new instance of the Stuff object. I request IStuff as the initial interface. I also use the __uuidof keyword to get the GUID for an interface / object instead of using the CLSID_ and IID_ parameters. In my opinion, it is more convenient.


The COM smart pointer takes care of the reference counting for me. This is most convenient, because then I am certain that it will get released properly without me having to worry about it.


Btw, this is also the reason that all the code is placed into a separate scope block after the call to CoInitialize. The smart pointers clean up their interface pointers when they go out of scope. If CoUninitialize would have been called before that, it would lead to weird problems / crashes.


Via the IStuff interface I request an enumerator object. This gets returned as an IUnknow interface, so I have to get the IEnumString interface via the IUnknown::QueryInterface method. So that’s 3 interfaces already, and I still don’t have to care about reference counting J


After that I use the ‘Next’ method of the IEnumString interface to retrieve the items contained in the enumerator. This the only place where I have to perform a manual cleanup action.


The IEnumString interface does not work with BSTRs, it works with LPOLESTRs. While these are code compatible because they are the same type of data, their semantics are not. The _Copy<LPOLESTR> mentioned earlier allocates data for the string using CoTaskMemAlloc. It has to be freed by CoTaskMemFree.


CComBSTR would use SysAllocString and SysFreeString. And while the code would compile and execute perfectly, there would be an crash as soon as the CComBSTR would try to free the string it got from the call to ‘Next’.


Conclusion


As you can see, implementing COM servers and clients can be really easy with the proper use of ATL. And you’ve also seen that implementing an enumerator object is made really easy.


The code demo code is up for download with this article under the MIT license. Have fun.


I will probably write some more ATL / COM stuff soon. Stay tuned for more DCOM archaeology from the mists of time ;-)

Getting started with ATL

I am currently working on some projects where I have to program a DCOM server. There are several reasons why it has to be DCOM and C++, instead of e.g. .NET remoting and C#.


The biggest pain (imo) about DCOM is that the technology stems from an era when the internet was no ubiquitous, and there were no huge forums filled with experts, or blogs by programming gurus and sites like codeproject.com There were newsgroups, but those were generally only used for specific problems.


There is a vast amount of concise information about everything you can do with the .NET framework. For DCOM there is … a big dark void.


Granted, there are several resources out there. But if you look on sites like codeproject, you will find that most of the articles about DCOM and ATL are written before 2003. Most of the information out there will not cover the improvements that have been made to ATL (or DCOM) since the last 5 years.


And when looking into the more obscure corners of DCOM, there are some things that noone really likes to talk about because the documented semantics are so vague, and if you look into things like custom IMoniker implementation… well… that is a story for another day.


Understanding DCOM


The most important thing with DCOM is that you understand the basics really well. I guess this is true for most things, but unlike other things like .NET, DCOM does not cut you any slack, and does not provide you with tons of helpful diagnostic information.


When I started with DCOM a couple of years ago, I first read ‘Essential COM’ by Don Box. This is really the most excellent book on COM ever.


Of course, this information becomes vitally important in the real world when you have to debug a problem, because no tool is going to tell you anything useful with DCOM related crashes.


Understanding ATL


In the real world (unless maintaining legacy code) you use ATL to write COM clients or servers. Personally, I don’t like books that just tell me how to use technology XYZ. I like to understand the ‘why’ and ‘how’ too.


This is where I have to give big thanks to ‘ATL Internals, Second Edition’. It is one of the very few books on ATL that were written for VS2005, and it is still valid for VS2008. It is also an extremely well written book.


If you are starting with ATL (or if you want to know how it works) buy this book. This book is very, very good, and will teach you a lot.


What happens next?


Over the next weeks / months I will probably write a couple of articles describing how to do specific things with ATL. One article that is nearly finished is about implementing and using IEnumString using ATL.


Not rocket science by any measure, but ATL is severely lacking in the area of easy-to-find, up-to-date and to-the-point examples. So I figured that there is still value in writing about the ancient art of circles and lollipops, even if many developers probably think COM and ATL are no longer worth bothering with.


Oh and I released some win32 DCOM demo projects some time ago. They were interesting for me because making them taught me a lot about the raw DCOM stuff that you normally don’t see with ATL. But I’ve always found that it is always useful to know what makes the motor hum.


So while they are not terribly useful for practical purposes, they are well documented and they might be useful if you are working your way through ‘Essential COM’

Cold, Coffee, and Developers

We are going through an unusual spell of cold weather at the moment. It was -15C when I left for work this morning. The intense cold also caused some of the outside water pipes on our site to freeze. In particular, the ones running to the temporary trailers where I am located atm. No water -> no coffee….


Except for me, that is. I have been making my own coffee for a long time now (pouring hot water over a filter with hand ground arabica beans). This means not only am I the only one here drinking good coffee, the last 2 days I was the only one drinking coffee :D.


And I know several peopel whose day did not begin well because of it. But I didn’t gloat. Poking fun at people going through caffeine withdrawal is just too dangerous ;)


And for something compeltely different; I read this on Raymond’s blog:


Generally speaking, programmers don’t do the visual design. I mean, these are people who are lucky if they are wearing matching socks when they come to work, if they even remember to wear socks at all. And you want them to design a color scheme?


I usually manage to wear matching socks, but when it comes to choosing interior decoration, paint colors, curtains and other stuff, my wife is firmly in charge. She has a veto on my clothes too :)


Maybe there are people which are both great developers and great graphics designers. But I’m not one of them, for sure. I prefer command line apps, services, COM servers, drivers, and other low level stuff. I am good with code, not with graphical stuff. The rare few times I need to have a GUI, the best I can come up with is the default grey dialog based application, and a menu if I am feeling creative.

Found an interesting bug in the Visual Studio IDE

I know, I know, … .NET is all the rage, and DCOM is legacy technology, best not touched by up and coming programmers with good hair and sharp look, lest they appear ‘uncool’ or even worse: ‘obsolete’.


But some of us still use DCOM for a number of good reasons. And ATL takes much of the pain out of DCOM development. Unfortunately, the Visual Studio IDE doesn’t always cooperate. I made a simple MFC dialog client application, and then tried to add the ATL Server project that the client would connect to. VS was clearly concerned about my attempts to use DCOM instead of .NET, and decided to sabotage my attempts to do anything useful with this new ATL project.


https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=391506


But luckily, I was able to figure it out and outsmart VS :)

More Terminal Server License Server weirdness

Last week I had a very peculiar problem with the Terminal Server licensing.


I couldn’t connect to the licensing server anymore, even though it was running. The Terminal Server Licensing app couldn’t detect it anymore. The only thing that still worked was a network ping, but as far as Windows was concerned, we were running unlicensed.


The weird thing was that Terminal Server itself disagreed, and kept working without complaints. One of these servers had been running for a year already, so it was not the 120 day grace period that was hiding an underlying problem.


At the time I didn’t really understand why it still worked, but my colleague pointed out the the system time was wrong. The licensing server is a virtual machine, and for some reason, the time of the virtual machine was not linked to the NTP controlled time of the host. And as we all know, if the system times are off by n minutes, the mutual authentication fails and no secure connection can be established.


As it turns out, Terminal Server Licensing is ever more stupid than I already knew. Not only doesn’t it count licenses, but the Terminal Servers don’t even connect to the license server to ask for licensing. Because if they did, then they would have complained about it since connecting was no longer possible.