Suppose to have a Unicode character that is a precomposed character, i.e. a Unicode entity that can be defined as a sequence of one or more characters. For instance: é (U+00E9, Latin small letter e with acute). This character is common in Italian, for example you can find it in “Perché?” (“Why?”).
This é character can be decomposed into an equivalent string made by the base letter e (U+0065, Latin small letter e) and the combining acute accent (U+0301).
So, it’s very reasonable that two Unicode strings, one containing the precomposed character “é” (U+00E9), and another made by the base letter “e” (U+0065) and the combining acute accent (U+0301), should be considered equivalent.
However, given those two Unicode strings defined in C++ as follows:
// Latin small letter e with acute const wchar_t s1[] = L"\x00E9"; // Latin small letter e + combining acute const wchar_t s2[] = L"\x0065\x0301";
calling wcscmp(s1, s2) to compare them returns a value different than zero, meaning that those two equivalent Unicode strings are actually considered different (which makes sense from a “physical” raw byte sequence perspective).
However, if those same strings are compared using the CompareStringEx() Win32 API as follows:
int result = ::CompareStringEx( LOCALE_NAME_INVARIANT, 0, // default behavior s1, -1, s2, -1, nullptr, nullptr, 0);
then the return value is CSTR_EQUAL, meaning that the two aforementioned strings are considered equivalent, as initially expected.