The Case of string_view and the Magic String

Someone was working on modernizing some legacy C++ code. The code base contained a function like this:

void DoSomething(const char* name)

The DoSomething function takes a read-only string expressed using a C-style string pointer: remember, this is a legacy C++ code base.

For the sake of this discussion, suppose that DoSomething contains some simple C++ code that invokes a C-interface API, like this (of course, the code would be more complex in a real code base):

void DoSomething(const char* name) 
{
    SomeCApi(name);
}

SomeCApi also expects a “const char*” that represents a read-only string parameter.

However, note that the SomeCApi cannot be modified (think of it like a system C-interface API, for example: a Windows C API like MessageBox).

For the sake of this discussion, suppose that SomeCApi just prints out its string parameter, like this:

void SomeCApi(const char* name) 
{
    printf(“Hello, %s!\n”, name);
}

In the spirit of modernizing the legacy C++ code base, the maintainer decides to change the prototype of DoSomething, stepping up from “const char*” to std::string_view:

// Was: void DoSomething(const char* name)
void DoSomething(std::string_view name)

The SomeCApi still expects a const char*. Remember that you cannot change the SomeCApi interface.

So, the maintainer needs to update the body of DoSomething accordingly, invoking string_view::data to access the underlying character array:

void DoSomething(std::string_view name) 
{
    // Was: SomeCApi(name);
    SomeCApi(name.data());
}

In fact, std::string_view::data returns a pointer to the underlying character array.

The code compiles fine. And the maintainer is very happy about this string_view modernization!

 

Then, the code is executed for testing, with a string_view name containing “Connie”. The expected output would be:

“Hello, Connie!”

But, instead, the following string is printed out:

“Hello, Connie is learning C++!”

Wow! Where does the “ is learning C++” part come from??

Is there some magic string hidden inside string_view?

As a sanity check, the maintainer simply prints out the string_view variable:

// name is a std::string_view
std::cout << “Name: “ << name;
 

And the output is as expected: “Name: Connie”.

So, it seems that cout does print the correct string_view name.

But, somehow, when the string_view is passed deep down to a legacy C API, some string “magic” happens, showing some additional characters after “Connie”.

What’s going on here??

Figuring Out the Bug

Well, the key here are two words: Null Terminator.

In fact, the C API that takes a const char* expected the string to be null-terminated.

On the other hand, std::string_view does not guarantee null-termination!

So, consider a string_view that “views” only a portion of a string, like this:

std::string str = “Connie is learning C++”;
auto untilFirstSpace = str.find(‘ ‘);
std::string_view name{str.data(), untilFirstSpace}; // “Connie”

The string_view certainly “views” the “Connie” part. But, if you consider the memory layout, after these “Connie” characters in memory there is no null terminator, which was expected by the C API. So, the C API views the whole initial string, until it finds the null terminator.

So, the whole string is printed out by the C API, not just the part observed by the string_view.

Memory layout: string_view vs. C-style null-terminated strings
Memory layout: string_view vs. C-style null-terminated strings

This is a very subtle bug, that can be hard to spot in more complex code bases.

So, remember: std::string_views are not guaranteed to be null terminated! Take that into consideration when calling C-interface APIs, that expect C-style null-terminated strings.

P.S. As a side note, std::string::c_str guarantees that the returned pointer points to a null-terminated character array.

Repro Code

// The Case of string_view and the Magic String 
// -- by Giovanni Dicanio

#include <stdio.h>

#include <iostream>
#include <string>
#include <string_view>

void SomeCApi(const char* name)
{
    printf("Hello, %s!\n", name);
}

void DoSomething(std::string_view name)
{
    SomeCApi(name.data());
}

int main()
{
    std::string msg = "Connie is learning C++";
    auto untilFirstSpace = msg.find(' ');

    std::string_view v{ msg.data(), untilFirstSpace };

    std::cout << "String view: " << v << '\n';

    DoSomething(v);
}

 

The Case of the Context-Menu Shell Extension That Doesn’t Get Invoked by Explorer

A developer was working on a context-menu shell extension.

He wrote all the skeleton infrastructure code, including some code to display the selected files when the shell extension gets invoked by Windows Explorer. It’s kind of the “Hello World” for context-menu shell extensions.

He happily builds his code within Visual Studio, copies the shell extension DLL to a virtual machine, registers the extension in the VM, right-clicks on a bunch of files… but nothing happens. His extension’s menu items just don’t show up in the Explorer context-menu.

What’s going wrong, he thought?

He starts thinking to all sorts of hypotheses and troubleshooting strategies.

He once again manually registers the COM DLL that implements the shell extension from the Command Prompt:

regsvr32 CoolContextMenuExtension.dll

A message box pops up, informing him that the registration has succeeded.

He tries right-clicking some files, but, once again, his extension’s menu items don’t show up.

He then checks that the extension was properly registered under the list of approved shell extensions (this was part of his shell extension infrastructure C++ code):

HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Shell Extensions\Approved

He finds the GUID of the shell extension in there, as expected.

He then checks that the extension was registered under the proper subkey in HKEY_CLASSES_ROOT. Found it there, too!

He starts wondering… what may be going so wrong?

He checks the C++ infrastructure code generated by the ATL Visual Studio wizard, including his own additions and modifications. The implementations of DllRegisterServer and DllUnregisterServer seem correct.

He also checks his own C++ code in the C++ class that implements the shell extension’s COM object. He has correctly implemented IShellExtInit::Initialize, and the three methods of IContextMenu: GetCommandString, InvokeCommand, and QueryContextMenu.

He takes a look at the class inheritance list: both IShellExtInit and IContextMenu are listed among the base classes.

The IShellExtInit and IContextMenu COM interfaces are correctly listed among the base classes.
The IShellExtInit and IContextMenu COM interfaces are correctly listed among the base classes.

He feels desperate. What’s going wrong?? He wrote several shell extensions in C++ in the past. Maybe there’s some bug in the current version of Visual Studio 2019 he is using?

Why wasn’t his context-menu shell extension getting called by Explorer?

 

I took a look at that code.

At some point, I was enlightened.

I gave a look at the COM interface map in the header file of the C++ shell extension class.

There’s a bug in the shell extension C++ object’s COM interface map.
There’s a bug in the shell extension C++ object’s COM interface map.

Wow! Can you spot the error?

Something is missing in there!

In fact, in addition to deriving the C++ shell extension class from IContextMenu, you also have to add an entry for IContextMenu in the COM map!

I quickly added the missing line:

    COM_INTERFACE_ENTRY(IContextMenu)

The COM interface map, with both the IShellExtInit and IContextMenu entries.
The COM interface map, with both the IShellExtInit and IContextMenu entries.

I rebuilt the extension, registered it, and, with great pleasure and satisfaction, the new custom items in the Explorer’s context-menu showed up! And, after selecting a bunch of files to test the extension, the message box invoked by the shell extension C++ code was correctly shown.

All right! 😊

So, what was going wrong under the hood?

Basically, since the context-menu shell extension was correctly registered under the proper key in the Windows Registry, I think that Windows Explorer was actually trying to invoke the shell extension’s methods.

In particular, I think Explorer called QueryInterface on the shell extension’s COM object, to get a pointer to IContextMenu. But, since IContextMenu was missing from the COM map, the QueryInterface code implemented by the ATL framework was unable to return the interface pointer back to Explorer. As a consequence of that, Explorer didn’t recognize the extension as a valid context-menu extension (despite the extension being correctly registered in the Windows Registry), and didn’t even call the IShellExtInit::Initialize method (that was actually available, as IShellExtInit had its own entry correctly listed in the COM map from the beginning).

The bug is in the details!

So, the moral of the story is to always check the COM map in addition to the base class list in your shell extension’s C++ class header file. The COM interfaces need to both be in the base class list and have their own entries in the COM map.

And, in addition, to me this experience has suggested that it would have been great if Visual Studio had issued at least a warning for having the IContextMenu COM interface listed among the base classes, but missing from the COM map! This would be an excellent time-and-bug-saving addition, Visual Studio IDE/Visual C++/Visual Assist X teams!

 

Subtle Bug When Converting Strings to Lowercase

Suppose that you want to convert a std::string object to lowercase.

The first thing you would do is probably searching the std::string documentation for a convenient easy simple method named to_lower, or something like that. Unfortunately, there’s nothing like that.

So, you might start developing your own “to_lower” function. A typical implementation I’ve seen of such custom function goes something like this: For each character in the input string, convert it to lowercase invoking std::tolower. In fact, there’s even this sample code on cppreference.com:

// From http://en.cppreference.com/w/cpp/string/byte/tolower

std::string str_tolower(std::string s) {
    std::transform(s.begin(), s.end(), s.begin(), 
                   [](unsigned char c) { return std::tolower(c); }
                  );
    return s;
}

Well, if you try this code with something like str_tolower(“Connie”), everything seems to work fine, and you get “connie” as expected.

Now, since C++ folks like storing UTF-8-encoded text in std::string objects, in some large code base someone happily takes the aforementioned str_tolower function, and invokes it with their lovely UTF-8 strings. Fun ensured! …Well, actually, bugs ensured.

So, the problem is that str_tolower, under the hood, calls std::tolower on each char in the input string. While this works fine for pure ASCII strings like “Connie”, such code is a bug farm for UTF-8 strings. In fact, UTF-8 is a variable-width character encoding. So, there are some Unicode “characters” (code points) that are encoded in UTF-8 using one byte, while other characters are encoded using two bytes, and so on, up to four bytes. The poor std::tolower has no clue of such UTF-8 encoding features, so it innocently spits out wrong results, char by char.

For example, I tried invoking the above function on “PERCHÉ” (the last character is the Unicode U+00C9 LATIN CAPITAL LETTER E WITH ACUTE, encoded in UTF-8 as the two-byte sequence 0xC3 0x89), and the result I got was “perchÉ” instead of the expected “perché” (é is Unicode U+00E9, LATIN SMALL LETTER E WITH ACUTE). So, the pure ASCII characters in the input string were all correctly converted to lowercase, but the final non-ASCII character wasn’t.

Actually, it’s not the std::tolower function: It’s that this function was misused, invoking it in a way that the function was not designed for.

This is one of the perils of taking std::string-based C++ code that initially worked with ASCII strings, and thoughtlessly reuse it for UTF-8-encoded text.

In fact, we saw a very similar bug in a previous blog post.

So, how can you fix that problem? Well, a portable way is using the ICU library with its icu::UnicodeString class and its toLower method.

On the other hand, if you are writing Windows-specific C++ code, you can use the LcMapStringEx API. Note that this function uses the UTF-16 encoding (as almost all Windows Unicode APIs do). So, if you have UTF-8-encoded text stored in std::string objects, you first have to convert it from UTF-8 to UTF-16, then invoke the aforementioned API, and finally convert the UTF-16-encoded result back to UTF-8. For these UTF-8/UTF-16 conversions, you may find my MSDN Magazine article on “Unicode Encoding Conversions with STL Strings and Win32 APIs” interesting.

 

Printing UTF-8 Text to the Windows Console

Let’s suppose you have a string encoded in UTF-8, and you want to print it to the Windows console.

You might have heard of the _setmode function and the _O_U8TEXT flag.

The MSDN documentation contains this compilable code snippet that you can use as the starting point for your experimentation:

// crt_setmodeunicode.c  
// This program uses _setmode to change  
// stdout to Unicode. Cyrillic and Ideographic  
// characters will appear on the console (if  
// your console font supports those character sets).  
  
#include <fcntl.h>  
#include <io.h>  
#include <stdio.h>  
  
int main(void) {  
    _setmode(_fileno(stdout), _O_U16TEXT);  
    wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");  
    return 0;  
}  

So, to print UTF-8-encoded text, you may think of substituting the _O_U16TEXT flag with _O_U8TEXT, and use printf (or cout) with a byte sequence representing your UTF-8-encoded string.

For example, let’s consider the Japanese name for Japan, written using the kanji 日本.

The first kanji is the Unicode code point U+65E5; the second kanji is U+672C. Their UTF-8 encodings are the 3-byte sequences 0xE6 0x97 0xA5 and 0xE6 0x9C 0xAC, respectively.

So, let’s consider this compilable code snippet that tries to print a UTF-8-encoded string:

#include <fcntl.h>
#include <io.h>
#include <stdint.h>
#include <iostream>

int main()
{
    _setmode(_fileno(stdout), _O_U8TEXT);

    // Japanese name for Japan, 
    // encoded in UTF-8
    uint8_t utf8[] = { 
        0xE6, 0x97, 0xA5, // U+65E5
        0xE6, 0x9C, 0xAC, // U+672C
        0x00 
    };

    std::cout << reinterpret_cast<const char*>(utf8) << '\n';
}

This code compiles fine. However, if you run it, you’ll get this error message:

Error when trying to print UTF-8 text to the console
Error when trying to print UTF-8 text to the console

So, how to print some UTF-8 encoded text to the Windows command prompt?

Well, it seems that you have to first convert from UTF-8 to UTF-16, and then use wprintf or wcout to print the UTF-16-encoded text. This isn’t optimal, but at least it seems to work.

EDIT 2019-04-09: See this new blog post for a solution with compilable demo code.

 

What Is the Encoding Used by the error_code message String?

std::system_error is an exception class introduced in C++11, that is thrown by various functions that interact with OS-specific APIs. The platform-dependent error code is represented using the std::error_code class (returned by system_error::code). The error_code::message method returns an explanatory string for the error code. So, what is the encoding used to store the text in the returned std::string object? UTF-8? Some other code-page?

To answer this question, I spelunked inside the MSVC STL implementation code, and found this _Winerror_message helper function that is used to get the description of a Windows error code.

STL _Winerror_message helper function
STL _Winerror_message helper function

This function first calls the FormatMessageW API to get the error message encoded in Unicode UTF-16. Then, the returned wchar_t-string is converted to a char-string, that is written to an output buffer allocated by the caller.

The conversion is done invoking the WideCharToMultiByte API, and the CodePage parameter is set to CP_ACP, meaning “the system default Windows ANSI code page” (copy-and-paste’d from the official MSDN documentation).

I think in modern C++ code, in general it’s a good practice to store UTF-8-encoded text in std::strings. The code pages are a source of subtle bugs, as there are many of them, they can change, and you end up getting garbage characters (mojibake) when different char-strings using different code pages are mixed and appended together (e.g.  when written to UTF-8 log files).

So, I’d have preferred using the CP_UTF8 flag with the WideCharToMultiByte call above, getting a char-string containing the error message encoded as a UTF-8 string.

However, this would cause mojibake bugs for C++/Windows code that uses cout or printf to print message strings, as this code assumes CP_ACP by default.

So, my point is still that char-strings should in general use the UTF-8 encoding; but unless the Windows console and cout/printf move to UTF-8 as their default encoding, it sounds like the current usage of CP_ACP in the error message string is understandable.

Anyway, due to the use of CP_ACP in the wchar_t-to-char string conversion discussed above, you should pay attention when writing error_code::message strings to UTF-8-encoded log files. Maybe the best thing would be writing custom code to get the message string from the error code identifier, and encoding it using UTF-8 (basically invoking FormatMessage followed by WideCharToMultiByte with CP_UTF8).

Thanks to Stephan T. Lavavej, Casey Carter and Billy O’Neal for e-mail communication on this issue.

 

The CStringW with wcout Bug Under the Hood

I discussed in a previous blog post a subtle bug involving CStringW and wcout, and later I showed how to fix it.

In this blog post, I’d like to discuss in more details what’s happening under the hood, and what triggers that bug.

Well, to understand the dynamics of that bug, you can consider the following simplified case of a function and a function template, implemented like this:

void f(const void*) {
  cout << "f(const void*)\n";
}

template <typename CharT> 
void f(const CharT*) {
  cout << "f(const CharT*)\n";
}

If s is a CStringW object, and you write f(s), which function will be invoked?

Well, you can write a simple compilable code containing these two functions, the required headers, and a simple main implementation like this:

int main() {
  CStringW s = L"Connie";
  f(s);
}

Then compile it, and observe the output. You know, printf-debugging™ is so cool! 🙂

Well, you’ll see that the program outputs “f(const void*)”. This means that the first function (the non-templated one, taking a const void*), is invoked.

So, why did the C++ compiler choose that overload? Why not f(const wchar_t*), synthesized from the second function template?

Well, the answer is in the rules that C++ compilers follow when doing template argument deduction. In particular, when deducing template arguments, the implicit conversions are not considered. So, in this case, the implicit CStringW conversion to const wchar_t* is not considered.

So, when overload resolution happens later, the only candidate available is f(const void*). Now, the implicit CStringW conversion to const wchar_t* is considered, and the first function is invoked.

Out of curiosity, if you comment out the first function, you’ll get a compiler error. MSVC complains with a message like this:

error C2672: ‘f’: no matching overloaded function found

error C2784: ‘void f(const CharT *)’: could not deduce template argument for ‘const CharT *’ from ‘ATL::CStringW’

The message is clear (almost…): “Could not deduce template argument for const CharT* from CStringW”: that’s because implicit conversions like this are not considered when deducing template arguments.

Well, what I’ve described above in a simplified case is basically what happens in the slightly more complex case of wcout.

wcout is an instance of wostream. wostream is declared in <iosfwd> as:

typedef basic_ostream<wchar_t, char_traits<wchar_t>> wostream;

Instead of the initial function f, in this case you have operator<<. In particular, here the candidates are an operator<< overload that is a member function of basic_ostream:

basic_ostream& basic_ostream::operator<<(const void *_Val)

and a template non-member function:

template<class _Elem, class _Traits> 
inline basic_ostream<_Elem, _Traits>& 
operator<<(basic_ostream<_Elem, _Traits>& _Ostr, const _Elem *_Val)

(This code is edited from the <ostream> standard header that comes with MSVC.)

When you write code like “wcout << s” (for a CStringW s), the implicit conversion from CStringW to const wchar_t* is not considered during template argument deduction. Then, overload resolution picks the basic_ostream::operator<<(const void*) member function (corresponding to the first f in the initial simplified case), so the string’s address is printed via this “const void*” overload (instead of the string itself).

On the other hand, when CStringW::GetString is explicitly invoked (as in “wcout << s.GetString()”), the compiler successfully deduces the template arguments for the non-member operator<< (deducing wchar_t for _Elem). And this operator<<(wostream&, const wchar_t*) prints the expected wchar_t string.

I know… There are aspects of C++ templates that are not easy.

 

Subtle Bug with CStringW and wcout: Where’s My String??

Someone wrote some C++ code like this to print the content of a CStringW using wcout:

CStringW s = L"Connie";
…
wcout << s << …

The code compiles fine, but the output is a hexadecimal sequence, not the string “Connie” as expected. Surprise, surprise! So, what’s going on here?

Well, wcout doesn’t have any clue on how to deal with CStringW objects. All in all, CStringW is part of ATL; instead wcout is part of the <iostream>: they are two separate worlds.

However, CStringW defines an implicit conversion operator to const wchar_t*. This makes it possible to simply pass CStringW objects to Win32 APIs expecting input C-style NUL-terminated string pointers (although, there’s a school of thought that discourages the use of implicit conversions in C++).

So, wcout has no idea how to print a CStringW object. However, wcout does know how to print raw pointers (const void*). So, in the initial code snippet, the C++ compiler first invokes the CStringW implicit const wchar_t* conversion operator. Then, the <iostream> operator<< overload that takes a const void* parameter is used to print the pointer with wcout.

In other words, the hexadecimal value printed is the raw C-style string pointer to the string wrapped by the CStringW object, instead of the string itself.

If you want to print the actual string (not the pointer), you can invoke the GetString method of CStringW. GetString returns a const wchar_t*, which is the pointer to the wrapped string; wcout does know how to print a Unicode UTF-16 wchar_t string via its raw C-style pointer. In fact, there’s a specific overload of operator<< that takes a const wchar_t*; this gets invoked, for example, when you pass wchar_t-string literals to wcout, e.g. wcout << L”Connie”, instead of picking the const void* overload that prints the pointer.

So, this is the correct code:

// Print a CStringW to wcout
wcout << s.GetString() << …

Another option is to explicitly static_cast the CStringW object to const wchar_t*, so the proper operator<< overload gets invoked:

wcout << static_cast<const wchar_t*>(s) << …

Although I prefer the shorter and clearer GetString call.

(Of course, this applies also to other wcout-ish objects, like instances of std::wostringstream.)

P.S. In my view, it would be more reasonable that “wcout << …” picked the const wchar_t* overload also for CStringW objects. It probably doesn’t happen for some reason involving templates or some lookup rule in the I/O stream library. Sometimes C++ is not reasonable (it happens also to C).

Subtle Bug with std::min/max Function Templates

Suppose you have a function f that returns a double, and you want to store in a variable the value of this function, if this a positive number, or zero if the return value is negative. This line of C++ code tries to do that:

double x = std::max(0, f(/* something */));

Unfortunately, this apparently innocent code won’t compile!

The error message produced by VS2015’s MSVC compiler is not very clear, as often with C++ code involving templates.

So, what’s the problem with that code?

The problem is that the std::max function template is declared something like this:

template <typename T> 
const T& max(const T& a, const T& b)

If you look at the initial code invoking std::max, the first argument is of type int; the second argument is of type double (i.e. the return type of f).

Now, if you look at the declaration of std::max, you’ll see that both parameters are expected to be of the same type T. So, the C++ compiler complains as it’s unable to deduce the type of T in the code calling std::max: should T be int or double?

This ambiguity triggers a compile-time error.

To fix this error, you can use the double literal 0.0 instead of 0.

And, what if instead of 0 there’s a variable of type int?

Well, in this case you can either static_cast that variable to double:

double x = std::max(static_cast<double>(n), f(/* something */));

or, as an alternative, you can explicitly specify the double template type for std::max:

double x = std::max<double>(n, f(/* something */));

A Subtle Bug with PInvoke and Safe Arrays Storing Variant Bytes

When exchanging array data between different module boundaries using safe arrays, I tend to prefer (and suggest) safe arrays of direct types, like BYTEs, or BSTR strings, instead of safe array storing variants (that in turn contain BYTEs, or BSTRs, etc.).

However, there are some scripting clients that only understand safe arrays storing variants. So, if you want to support such clients, you have to pack the original array data items into variants, and build a safe array of variants.

If you have a COM interface method or C-interface function that produces a safe array of variants that contain BSTR strings, and you want to consume this array in C# code,  the following PInvoke seems to work fine:

[DllImport("NativeDll.dll", PreserveSig = false)]
pubic static extern void BuildVariantStringArray(
  [Out, MarshalAs(UnmanagedType.SafeArray, SafeArraySubType = VarEnum.VT_VARIANT)]
  out string[] result);

So, if you have a safe array of variants that contain BYTEs, you may deduce that such a PInvoke declaration would work fine as well:

[DllImport("NativeDll.dll", PreserveSig = false)]
pubic static extern void BuildVariantByteArray(
  [Out, MarshalAs(UnmanagedType.SafeArray, SafeArraySubType = VarEnum.VT_VARIANT)]
  out byte[] result);

I’ve just changed “string[]” to “byte[]” in the declaration of the “result” out parameter.

Unfortunately, this doesn’t work. What you get as a result in the output byte array is garbage.

The fix in this case of safe array of variant bytes is to use an object[] array in C#, which directly maps the original safe array of variants (as variants are marshaled to objects in C#):

[DllImport("NativeDll.dll", PreserveSig = false)]
pubic static extern void BuildVariantByteArray(
  [Out, MarshalAs(UnmanagedType.SafeArray, SafeArraySubType = VarEnum.VT_VARIANT)]
  out object[] result);

And then manually convert from the returned object[] array to a byte[] array, for example using the C# Array.CopyTo method; e.g.:

// Get a safe array of variants (that contain bytes).
object[] data;
BuildVariantByteArray(out data);

// "Render" (copy) the previous object array 
// to a new byte array.
byte[] byteData = new byte[data.Length];
data.CopyTo(byteData, 0);

// Use byteData...

A variant is marshaled using object in C#. So a safe array of variants is marshaled using an object array in C#. In the case of safe arrays of variant bytes, the returned bytes are boxed in objects. Using Array.CopyTo, these bytes get unboxed and stuffed into a byte array.

The additional CopyTo step doesn’t seem necessary in the safe array of string variants, probably because strings are objects in C#.

Still, I think this aspect of the .NET/C# marshaler should be fixed, and if a PInvoke declaration clearly states byte[] on the C# side, the marshaler should automatically unbox the bytes from the safe array of variants.

 

The New C++11 u16string Doesn’t Play Well with Win32 APIs

Someone asked why in this article I used std::wstring instead of the new C++11 std::u16string for Unicode UTF-16 text.

The key point is that Win32 Unicode UTF-16 APIs use wchar_t as their code unit type; wstring is based on wchar_t, so it works fine with those APIs.

On the other hand, u16string is based on the char16_t type, which is a new built-in type introduced in C++11, and is different from wchar_t.

So, if you have a u16string variable and you try to use it with a Win32 Unicode API, e.g.:

// std::u16string s;
SetWindowText(hWnd, s.c_str());

Visual Studio 2015 complains (emphasis mine):

error C2664: ‘BOOL SetWindowTextW(HWND,LPCWSTR)’: cannot convert argument 2 from ‘const char16_t *’ to ‘LPCWSTR’

note: Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast

wchar_t is non-portable in the sense that its size isn’t specified by the standard; but, all in all, if you are invoking Win32 APIs you are already in an area of code that is non-portable (as Win32 APIs are Windows platform specific), so adding wstring (or even CString!) to that mix doesn’t change anything with respect to portability (or lack thereof).