Practical ATL: recognizing marshalling problems

Before I publish my next article, I should explain something else about the black magic aspect of ATL / COM: Marshalling.


The Stuff server we made in the previous article is an out-of-proc server. This means that the arguments being passed to / from a stuff instance have to be somehow moved back and forth between different process contexts (or different threading apartments).


In the case of IStuff, we don’t really have to bother, because IStuff derives from IDispatch. And in that case, there is a marshaller (aptly named Universal Marshaller) that can be used to do all this for you.


If IStuff wouldn’t derive from IDispatch, then we would need to provide our own proxy – stub interface that knows how to marshal the interface. With ATL, this is not difficult. You’ll notice that the Stuff solution of my previous article had a ‘Stuff_ServerPS’ project that I didn’t mention before.


That solution will build the proxy stub dll that will perform the marshaling. It is disabled by default, so you have to enable it in the build configuration before it is built. Also remember to rebuild the proxy, every time the IDL files of your project change.


So for the remainder of this article I will show what can happen if there is a marshalling problem. Don’t worry if it all seems ugly, gory, and incomprehensible. Normally you won’t have to deal with issues like this. But if you should ever do, you should at least recognize the signs that you have a marshaling problem.


Confused? It’s the marshaller


If you find yourself in a situation where very weird things happen, suspect a marshalling problem.


If you get the error code E_NOINTERFACE when you know for sure that your object implements it properly, suspect a marshalling problem.


If you get a crash while using an interface pointer that was properly created without getting a HRESULT that indicated a problem,… that’s right: you can bet it’s a marshaling problem.


Example


The article I am currently writing discusses implementing a non standard class object, for the purpose of parameterized construction of the Stuff object. Never mind for the moment about what any of that means.


For all practical purposes: it just means that I defined the following interface via copy / paste / edit:


[


      object,
      local,
      uuid(6DD69CDB-3128-432b-B335-773A287E6F06),
      dual,
      helpstring(“IStuffCreator Interface”),
      pointer_default(unique)
]
interface IStuffCreator : IUnknown {
  [id(1), helpstring(“method MakeMeAStuff”)]
    HRESULT MakeMeAStuff(
      [in] BSTR name,
      [in] REFIID riid,
      [out] void ** ppStuff);
};


There were 4 things wrong with this interface (showing just why you shouldn’t define IDL interfaces in a hurry). This has to be some sort of record.


  1. ‘local’ shouldn’t be there.
  2. ‘dual’ shouldn’t be there.
  3. ‘void’ should be ‘IUnknown’
  4. ‘ppStuff’ should be attributed with riid_is(riid).

So let’s have a look at the result of these different issues.


Issue 1: local


‘local’ simply shouldn’t be there. ‘local’ is a MIDL attribute that –when use in an interface header- causes MIDL not to generate stubs for that interface.


The effect on our interface is that the IStuff specific methods are not forwarded to the remote server.


You might wonder: why does it exist. The answer is: in case you want to write custom marshalling code, or in case the calls never have to be remoted. In that case you can have a local interface and a remote interface. And usage and marshalling could be optimized.


IUnknown is such an interface. But it has custom stub code to deal with things like remoting


The result of using ‘local’ on my IStuffCreator interface is that trying to call GetEnum on the returned IStuff interface pointer will result in a debug assertion: Run-Time Check Failure #0 – The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.


And if you have this problem with COM, it is usually indeed an indication that you have a calling convention mismatch. In this case, it is a side effect of the fact that we are trying to execute a method that isn’t remoted.


Issue 2: dual


This one shouldn’t be here either. Dual indicates that the interface implements IDispatch, and this could have an effect on marshalling.


In this case, it worked just fine. Perhaps it is because of the fact that I use my object as a class object, or perhaps because my interface itself uses only arguments that are automation compatible, or perhaps because the stars are aligned in the proper configuration.


I don’t know. And anything you don’t know for certain will hurt you when dealing with DCOM. It has to be correct.


Issue 3: void


This one is interesting.


If I want to pass an interface pointer, it has to be passed as an IUnknown**, instead of a void**. This is because Universal Marshaller has no idea on how to marshal something which has only contextual meaning, nor does the IDL compiler know what you mean with it.


With this I mean that my server knows it will be an interface pointer, and my client will know it is an interface pointer, but the stuff in the middle doesn’t. It sees just a memory address.


I used void** because that is what QueryInterface itself uses. And what’s good enough for QI should be good enough for me. Except QI has custom code for marshalling its arguments across, which I don’t. So QI can use void** because both the proxy and the stub know what is going on.


So what happens when you use void**?


I request the class object like this:


    hr = CoGetClassObject(
           __uuidof(Stuff),


           CLSCTX_ALL,


           NULL,


           __uuidof(IStuffCreator),


           (void**)&stuffClass);


 


And the resulting hr is 0x80020008: Bad variable type. There is absolutely nothing wrong with this line of code. But because there is something in the interface itself that confuses the marshaller, it tells me to sod off when I want that specific interface.


Credit where it is due: I figured this one out thanks to someone whose name I can’t figure out J. But his blog post is here.


Issue 4: iid_is(riid)


This one is also a good one.


The iid_is informs the marshaller that riid points to the identifier which identifies the interface that is being marshaled.


This way, the marshaller knows how to marshal the interface that was specified with riid. Without that part, it would only marshal the IUnknown part of the interface, and not the IStuff specific part.


As a result, you can use the IUnknown part of the interface without a problem, but if you use the IStuff specific part, you will get an access violation, stack corruption or other interesting phenomenon.


Conclusion


A couple of things can be concluded at this point.


First of all: Marshaling problems are hard to debug, because the problems occur outside of your source code.


If you use any interfaces that don’t derive from IDispatch, then you have to enable the ‘xxxPS’ project in your solution so that the ProxyStub dll will be built and registered. And of course, this dll has to be part of your install procedure. Without this stub, you could get access violations, or you get an E_NOINTERFACE error if the marshaler doesn’t know one of those interface exist.


As I have shown, you need to be careful and pay attention when you manually define COM interfaces. If you don’t, then you can have all sorts of hard to diagnose problems.


And if you ever find yourself in a situation where nothing makes sense, then suspect marshalling issues and have a long hard look at your IDL files.


 

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>