Updates to the ATL/STL Unicode Encoding Conversion Code

I’ve updated my code on GitHub for converting between UTF-8, using STL std::string, and UTF-16, using ATL CStringW.

Now, on errors, the code throws instances of a custom exception class that is derived from std::runtime_error, and is capable of containing more information than a simple CAtlException.

Moreover, I’ve added a couple of overloads for converting from source string views (specified using an STL-style [start, finish) pointer range). This makes it possible to efficiently convert only portions of longer strings, without creating ad hoc CString or std::string instances to store those partial views.

 

Custom C++ String Pool Allocator on GitHub

I’ve uploaded on GitHub some C++ code of mine, implementing a custom string pool allocator.

The basic idea is to allocate big chunks of memory, and then serve single string allocations carving memory from inside those blocks, with a simple fast pointer increase.

There’s also a benchmark comparing this custom allocator vs. STL’s strings.

Custom string pool allocator benchmark results.
Custom string pool allocator benchmark results.

The results clearly show that both allocating strings that way, and sorting them, is faster than using the default std::wstring class.

 

The New C++11 u16string Doesn’t Play Well with Win32 APIs

Someone asked why in this article I used std::wstring instead of the new C++11 std::u16string for Unicode UTF-16 text.

The key point is that Win32 Unicode UTF-16 APIs use wchar_t as their code unit type; wstring is based on wchar_t, so it works fine with those APIs.

On the other hand, u16string is based on the char16_t type, which is a new built-in type introduced in C++11, and is different from wchar_t.

So, if you have a u16string variable and you try to use it with a Win32 Unicode API, e.g.:

// std::u16string s;
SetWindowText(hWnd, s.c_str());

Visual Studio 2015 complains (emphasis mine):

error C2664: ‘BOOL SetWindowTextW(HWND,LPCWSTR)’: cannot convert argument 2 from ‘const char16_t *’ to ‘LPCWSTR’

note: Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast

wchar_t is non-portable in the sense that its size isn’t specified by the standard; but, all in all, if you are invoking Win32 APIs you are already in an area of code that is non-portable (as Win32 APIs are Windows platform specific), so adding wstring (or even CString!) to that mix doesn’t change anything with respect to portability (or lack thereof).