Using STL Strings in ATL/WTL/MFC-Based C++ Code

Many C++ beginners (and not only beginners…) seem to struggle when dealing with STL’s strings in Win32 C++ code.

I wrote a detailed article published on MSDN Magazine on “Using STL Strings at Win32 API Boundaries”, which may be interesting or helpful for someone.

But here I’d like to discuss a specific slightly different scenario, that is the “interoperability” of STL’s strings with ATL or MFC-based code.

A common pattern in this context is having CString used as the default string class, instead of STL’s strings. For example, ATL, WTL and MFC’s classes use the CString class as their “native” string type.

Before moving forward, let me clarify that I’m going to discuss the case of Unicode builds, which have been the default since probably Visual Studio 2005. (“ANSI” builds are something of the past, and to me they don’t make much sense in modern C++ Windows software; they are also a big source of trouble and confusion between “ANSI” code page, several other different code pages, etc.).

In Unicode builds, CString represents a Unicode UTF-16 string. The STL’s equivalent (on the Windows platform with Visual Studio) is std::wstring.

A very common pattern in ATL/WTL/MFC-based C++ code is having:

  • Input strings passed as raw C-style NUL-terminated read-only string pointers, using the LPCTSTR Win32 typedef.
  • Output strings passed as non-const references to CString, i.e. CString&.

Let’s consider the input case first. The LPCTSTR typedef is equivalent to “const TCHAR*“.  In Unicode builds, TCHAR is equivalent to wchar_t. So, in the input case, in Unicode builds, the string is usually represented as a raw C-style NUL-terminated “const wchar_t*” pointer.

How can a std::wstring instance be passed to a “const wchar_t*” parameter? Simple: just call its c_str() method!

// void DoSomething(LPCTSTR inputString);

std::wstring s = /* Some string */;

// Pass std::wstring as an 
// input C-style NUL-terminated 
// wchar_t-based string
DoSomething( s.c_str() );

Now, let’s consider the CString& output case. Here, what I suggest is to simply create an instance of CString, pass it as the output string parameter, and then just convert the returned CString to a std::wstring. In code:

// void DoSomething(CString& outputString);

// Just follow the method's prototype literally,
// and pass a CString instance that will be filled
// with the returned string.
CString cs;
DoSomething(cs);

// Convert from CString to std::wstring
std::wstring ws(cs);

// Now use the wstring ...

The last line converting from CString to std::wstring works since CString has an implicit conversion operator to LPCTSTR, which in Unicode builds is equivalent to “const wchar_t*”. So, CString is happy to be automatically converted to a “const wchar_t*”, i.e. a “raw C-style NUL-terminated wchar_t-based read-only string pointer”.

On the other side, std::wstring has an overloaded constructor expecting exactly a “const wchar_t*”, i.e. a “raw C-style NUL-terminated wchar_t-based read-only string pointer”, so there’s a perfect match here!

This conversion code can be optimized. In fact, for the previous conversion, std::wstring needs to know the exact length of the input string (i.e. its wchar_t count), and to do so it would typically call an strlen-like function that works for wchar_t-based strings. This is typically a O(N) operation. But a CString already knows its length: it’s bookmarked in the CString class and the CString::GetLength() method will return it instantly in O(1)! Considering that std::wstring has another overloaded constructor expecting a pointer and a length (i.e. wchar_t-count), we can combine these pieces of information building a convenient simple and efficient conversion function from CString to wstring:

inline std::wstring ToWString(const ATL::CStringW& s)
{
  if (!s.IsEmpty())
  {
    return std::wstring(s, s.GetLength());
  }
  else
  {
    return std::wstring();
  }
}

(I explicitly used the more specific CStringW  class in the aforementioned code snippet, but you can freely use CString in Unicode builds. In fact, in Unicode builds, CString is equivalent to CStringW.)

P.S. This blog post discussed the specific Unicode UTF-16 case. If you want to use the STL’s std::string class, you can store Unicode text in it using UTF-8 encoding. In this case, conversions between UTF-16 and UTF-8 (for std::string) are required. This will be discussed in a future article.

EDIT (2016, September 12):  Conversions between Unicode UTF-16 and UTF-8 (for std::string) are discussed in detail in this MSDN Magazine article of mine: “Unicode Encoding Conversions with STL Strings and Win32 APIs”.

 

A Simple STL vs. ATL String Performance Test – And the Winner Is… STL!

EDIT: New Performance Test available here.

 

In addition to the previous GotW#45 tests executed for the ATL’s and STL’s strings, I wanted to try my own tests. So I developed some simple C++ code that builds a std::vector of strings, and then sorts that vector. This test is executed for both ATL’s CStringW and STL’s wstring. Both the insertion times and sorting times are recorded.

You can check it out on GitHub.

The code was compiled with VS2015, and executed on an Intel i7 workstation running Windows 10 64-bit.

As you can see from the following screenshots, the STL string class always wins (faster execution times). In particular, when the test is executed for tiny strings, the STL’s small string optimization (SSO) is a clear winner over ATL’s CString.

Testing ATL vs. STL String Performance
Testing ATL vs. STL String Performance
Testing ATL vs. STL String Performance (SSO)
Testing ATL vs. STL String Performance (SSO)

I recall I did some similar measurements with older VC++ compiler versions (probably VS2008) and ATL’s CString was a winner back then. C++11’s move semantics and other improvements in the MSVC compiler and in the STL implementation made an important difference here. Kudos to the Visual C++ Team’s compiler and STL library guys for the improvements!

 

The LGPL and Libraries Exposing a C++-Interface

The LGPL is a common open-source license for libraries. One of the key point of the LGPL is the separation of the LGPL-licensed library from the products using it; in particular, users should be able to supply their own versions of a given LGPL library to replace the one shipped with a software product.

If you provide the source code of your software product, then it’s kind of easy: users can recompile everything (i.e. your software’s source code and their own version of the LGPL library).

But what happens when you want to use an LGPL-licensed library in a closed-source commercial product?

Many developers say: “Just dynamically-link to the LGPL-library!” The idea here is to build the LGPL library as a DLL (on Windows), and to have your commercial software dynamically linking to that. So users can replace the DLL that ships with your commercial software with their own version of the DLL, built from their own version of the LGPL library. Unfortunately, this theory is not always true in practice.

In particular, let’s consider the case of an LGPL library written in C++, built as a DLL on Windows, and exposing a C++ interface that uses some STL classes. As we saw in a previous blog post, this is a highly-constraining design. For that to work properly, both the DLL and the calling EXE must be built with the same C++ compiler version, they must be both dynamically-linked to the same flavor of the CRT, etc. That’s very fragile.

So, in this context, just replacing the DLL that ships with the commercial product with a user-provided DLL won’t work in the general case, for example when the user-provided DLL is built with a different version of the Visual C++ compiler used for the EXE.

In other words, in this case the only way for users to replace the original DLL shipped with the product with their own version is to recompile everything, which does require access to the product’s source code (or at least to some parts of it).

So, in such cases, the LGPL doesn’t seem feasible for a commercial closed-source software product.

On the other hand, C-interface DLLs (designed with proper care for dynamically-allocated memory at the boundaries) are safely replaceable by users, without requiring recompilation of other modules dynamically-linked to them.

Disclaimer: I’m not a lawyer, and this post is not meant to give legal advice, but just to discuss a topic from a software developer’s perspective. As always, respectful constructive comments are welcome.

 

The Perils of C++-Interface DLLs

DLLs exposing a C++ interface are highly constraining. In fact, both the DLLs and the EXEs using them must be built with the same C++ compiler version, using the same settings (e.g. same _HAS_ITERATOR_DEBUGGING/_ITERATOR_DEBUG_LEVEL settings), and both the DLLs and the EXEs must be dynamically linked to the same flavor of the CRT.

For example, suppose that MyLib.DLL is a DLL exposing a C++ interface, and in particular, as part of this C++ interface, it exposes an STL class, for instance: a std::vector<int>.

Moreover, let’s suppose we have MyProgram.EXE, which uses the aforementioned MyLib.DLL. Initially both MyLib.DLL and MyProgram.EXE are built with VS2010, both dynamically linked to the VS2010’s C/C++ runtime. Everything’s fine.

Then, someone decides to improve the implementation code of MyLib.DLL, maybe using some new C++11 features, and this DLL gets rebuilt using VS2015. MyProgram.EXE is still the old VS2010-compiled executable. Now things start going wrong. What’s the matter with that?

Well, one of the problems here is that the “std::vector<int>” exposed at the interface by the VS2015-rebuilt MyLib.DLL is ­different from the “std::vector<int>” expected by the older MyProgram.EXE!

In fact, just considering the size in bytes of the VS2010’s “std::vector<int>” vs. the VS2015’s one, in a release build sizeof will return 16 (bytes) for the former, and 12 (bytes) for the latter. So, the std::vector’s size has changed from VS2010 to VS2015.

The reduction in size is an effect of an optimization that took place starting with VS2012: basically they avoided the storage of empty allocators for std::vector, so you end up with just three pointers in a release-mode std::vector. Considering 4 bytes for each pointer (in 32-bit builds), you end up with:

3 [pointers] *4 [bytes/pointer] = 12 bytes

which is optimally small for std::vector.

Anyway, the point is that just these two different std::vector’s sizes in VS2010 and VS2015 show that the binary layout of the std::vector template has changed between the two C++ compiler versions. So there is a clear mismatch between the std::vector expected by a DLL built with one version of the VC++ compiler and the std::vector expected by an EXE built with a different version of the VC++ compiler.

In this MSDN document titled “Breaking Changes in Visual C++ 2015”, it’s clearly written (emphasis mine):

“Standard Template Library

To enable new optimizations and debugging checks, the Visual Studio implementation of the C++ Standard Library intentionally breaks binary compatibility from one version to the next. Therefore, when the C++ Standard Library is used, object files and static libraries that are compiled by using different versions can’t be mixed in one binary (EXE or DLL), and C++ Standard Library objects can’t be passed between binaries that are compiled by using different versions.”

Note that even in case of the same VC++ compiler version, there are different std::vector’s sizes between debug builds vs. release builds. For example, with VS2015, “sizeof(std::vector<int>)” returns 16 (bytes) and 12 (bytes) in debug mode vs. release mode respectively. The size overhead in debug builds is due to some additional machinery (and consequent overhead) that helps spotting bugs in that building mode.

So, even when the same VC++ compiler is used, there are differences between std::vector’s layouts between debug builds and release builds; so, again, there’s a mismatch between the EXE’s and DLL’s expectations on a “std::vector<int>”.

There are a few options to increase the decoupling between DLLs and EXEs.

One option is to develop DLLs exposing a pure C interface. Of course, C++ can be used inside the DLL, in the implementation. But the interface must be pure C. Note that C++-specific features like exceptions must be caught inside DLL’s boundaries, and converted to something C-style like error return codes at the DLL’s boundaries. This approach is used by many Win32 APIs.

And in fact, for example, hypothetically assuming they built Windows 7 using some version of the Visual C++ 2008 compiler, we can call the Win32 APIs exposed by Windows 7 from C++ executables built using future versions of the MSVC compiler, like those shipping in Visual Studio 2010 or 2013, just to name a few. There’s no constraint for application developers in using the same C++ compiler used by the Windows Team to build the operating system, thanks to Win32 APIs exposing a pure-C interface.

Another option is to expose C++ abstract interfaces (i.e. C++ classes that contain only pure virtual methods and no data members) and C-interface helper functions, like factory functions. This is what basically COM does. And COM is another technology used by several important Windows subsystems, like DirectX.  So if you develop a COM DLL, it can be safely used by C++ executables built with different versions of the VC++ compiler.

Note: There are even other details to consider when building highly reusable software components in DLLs. For example, in the presence of dynamically-allocated objects exchanged between the DLL and the EXE, the exported component and all the modules using it must use the same memory allocator. In other words, the code that allocates memory and the code that frees it must use the same allocator. An option to solve this problem is to allocate and release memory invoking APIs like CoTaskMemAlloc() and CoTaskMemFree(), since both use a common memory allocator.

 

Is Copy-on-write Really a Pessimization Under Multithreading?

Copy-on-write (COW) is an optimization technique used by several C++ classes, including the ATL’s CString class, and many classes in the Qt framework.

However, it seems that the C++11 standard bans the use of COW in std::string.

Many developers consider COW a “pessimization” under multi-threading, and to support their argument they usually point to a piece written by Herb Sutter (GotW#45).

I’m an intellectually curious person, so I wanted to test that code on modern systems. I downloaded the original GotW#45 code, did some adjustments to make it compile cleanly with Visual Studio 2015 at warning level 4, changed some stuff like using Windows high-performance counters to measure time (instead of GetTickCount), and added a couple of tests for STL’s std::string and ATL’s CStringA.

You can download the modified code here from GitHub.

On a modern Intel i7-based workstation, the results for 100-char-length strings seem to show that CString (which is COW-based) actually performs better than std::string in those tests.

100-char STL string slower than COW ATL CString
100-char-length std::string’s performance worse than COW-based ATL CString.

However, when strings of shorter lengths are tested (e.g. 10 chars), std::string wins.

So, probably COW is not always a pessimization: there are cases in which the size of the data to copy can have a significant impact.

It’s also interesting that the fbstring class, which is a drop-in replacement for std::string, claiming significantly increased performance, uses COW for large strings.

fbstring uses COW for large strings
fbstring uses COW for large strings

 

P.S. Of course, I’m aware that are several other aspects to consider from a performance perspective, including data locality, etc. Still, I think these results are interesting.