Comparing Unicode Strings Containing Combining Characters

Suppose to have a Unicode character that is a precomposed character, i.e. a Unicode entity that can be defined as a sequence of one or more characters. For instance: é (U+00E9, Latin small letter e with acute). This character is common in Italian, for example you can find it in “Perché?” (“Why?”).

This é character can be decomposed into an equivalent string made by the base letter e (U+0065, Latin small letter e) and the combining acute accent (U+0301).

So, it’s very reasonable that two Unicode strings, one containing the precomposed character “é” (U+00E9), and another made by the base letter “e” (U+0065) and the combining acute accent (U+0301), should be considered equivalent.

However, given those two Unicode strings defined in C++ as follows:

  // Latin small letter e with acute
  const wchar_t s1[] = L"\x00E9";

  // Latin small letter e + combining acute
  const wchar_t s2[] = L"\x0065\x0301";

calling wcscmp(s1, s2) to compare them returns a value different than zero, meaning that those two equivalent Unicode strings are actually considered different (which makes sense from a “physical” raw byte sequence perspective).

However, if those same strings are compared using the CompareStringEx() Win32 API as follows:

  int result = ::CompareStringEx(
    LOCALE_NAME_INVARIANT,
    0, // default behavior
    s1, -1,
    s2, -1,
    nullptr,
    nullptr,
    0);

then the return value is CSTR_EQUAL, meaning that the two aforementioned strings are considered equivalent, as initially expected.

 

The secret QueryInterface call of CComPtr

CComPtr is a convenient smart pointer ATL class to manage reference counting of COM objects.
However, it seems that sometimes smart pointers are too smart… In particular, I’m referring to the secret QueryInterface‘ing assignment operator discussed by Jared Parsons on his blog:

http://blogs.msdn.com/jaredpar/archive/2009/11/04/type-safety-issue-when-assigning-ccomptr-t-instances.aspx

The problem is the IUnknown::QueryInterface call performed by AtlComQIPtrAssign in the following templated assignment operator overload of CComPtr:

template <typename Q>

T* operator=(_In_ const CComPtr<Q>& lp) throw()

{

  if( !IsEqualObject(lp) )

  {

    return static_cast<T*>(AtlComQIPtrAssign((IUnknown**)&p, lp, __uuidof(T)));

  }

  return *this;

}

As an example, the following C++ code compiles fine (and I think it shouldn’t) on VC9 (VS2008 SP1):

#include <atlcomcli.h>

 

int main()

{

    CComPtr<IMarshal> sp1;

    CComPtr<IPersist> sp2;

 

    // I think the following statement should not compile,

    // but instead it does compile…

    sp1 = sp2;

 

    return 0;

}

Frankly speaking, I consider this behavior of CComPtr a bug; in fact, CComPtr isn’t supposed to call QueryInterface automatically: there’s CComQIPtr for that.
A possible fix to the aforementioned CComPtr behavior (bug) could be to redefine the templated assignment operator using implicit conversion for underlying raw pointers (instead of using IUnknown::QueryInterface via AtlComQIPtrAssign), e.g.:

    template <typename Q>

    T* operator=(_In_ const CComPtr<Q>& lp) throw()

    {

        return (*this = lp.p);

    }

 

A VS2008 solution with sample C++ code including the fix to CComPtr is attached to this blog post.

Thanks to Igor Tandetnik for private communication about this issue.