Using STL Strings in ATL/WTL/MFC-Based C++ Code

Many C++ beginners (and not only beginners…) seem to struggle when dealing with STL’s strings in Win32 C++ code.

I wrote a detailed article published on MSDN Magazine on “Using STL Strings at Win32 API Boundaries”, which may be interesting or helpful for someone.

But here I’d like to discuss a specific slightly different scenario, that is the “interoperability” of STL’s strings with ATL or MFC-based code.

A common pattern in this context is having CString used as the default string class, instead of STL’s strings. For example, ATL, WTL and MFC’s classes use the CString class as their “native” string type.

Before moving forward, let me clarify that I’m going to discuss the case of Unicode builds, which have been the default since probably Visual Studio 2005. (“ANSI” builds are something of the past, and to me they don’t make much sense in modern C++ Windows software; they are also a big source of trouble and confusion between “ANSI” code page, several other different code pages, etc.).

In Unicode builds, CString represents a Unicode UTF-16 string. The STL’s equivalent (on the Windows platform with Visual Studio) is std::wstring.

A very common pattern in ATL/WTL/MFC-based C++ code is having:

  • Input strings passed as raw C-style NUL-terminated read-only string pointers, using the LPCTSTR Win32 typedef.
  • Output strings passed as non-const references to CString, i.e. CString&.

Let’s consider the input case first. The LPCTSTR typedef is equivalent to “const TCHAR*“.  In Unicode builds, TCHAR is equivalent to wchar_t. So, in the input case, in Unicode builds, the string is usually represented as a raw C-style NUL-terminated “const wchar_t*” pointer.

How can a std::wstring instance be passed to a “const wchar_t*” parameter? Simple: just call its c_str() method!

// void DoSomething(LPCTSTR inputString);

std::wstring s = /* Some string */;

// Pass std::wstring as an 
// input C-style NUL-terminated 
// wchar_t-based string
DoSomething( s.c_str() );

Now, let’s consider the CString& output case. Here, what I suggest is to simply create an instance of CString, pass it as the output string parameter, and then just convert the returned CString to a std::wstring. In code:

// void DoSomething(CString& outputString);

// Just follow the method's prototype literally,
// and pass a CString instance that will be filled
// with the returned string.
CString cs;

// Convert from CString to std::wstring
std::wstring ws(cs);

// Now use the wstring ...

The last line converting from CString to std::wstring works since CString has an implicit conversion operator to LPCTSTR, which in Unicode builds is equivalent to “const wchar_t*”. So, CString is happy to be automatically converted to a “const wchar_t*”, i.e. a “raw C-style NUL-terminated wchar_t-based read-only string pointer”.

On the other side, std::wstring has an overloaded constructor expecting exactly a “const wchar_t*”, i.e. a “raw C-style NUL-terminated wchar_t-based read-only string pointer”, so there’s a perfect match here!

This conversion code can be optimized. In fact, for the previous conversion, std::wstring needs to know the exact length of the input string (i.e. its wchar_t count), and to do so it would typically call an strlen-like function that works for wchar_t-based strings. This is typically a O(N) operation. But a CString already knows its length: it’s bookmarked in the CString class and the CString::GetLength() method will return it instantly in O(1)! Considering that std::wstring has another overloaded constructor expecting a pointer and a length (i.e. wchar_t-count), we can combine these pieces of information building a convenient simple and efficient conversion function from CString to wstring:

inline std::wstring ToWString(const ATL::CStringW& s)
  if (!s.IsEmpty())
    return std::wstring(s, s.GetLength());
    return std::wstring();

(I explicitly used the more specific CStringW  class in the aforementioned code snippet, but you can freely use CString in Unicode builds. In fact, in Unicode builds, CString is equivalent to CStringW.)

P.S. This blog post discussed the specific Unicode UTF-16 case. If you want to use the STL’s std::string class, you can store Unicode text in it using UTF-8 encoding. In this case, conversions between UTF-16 and UTF-8 (for std::string) are required. This will be discussed in a future article.



A Simple STL vs. ATL String Performance Test – And the Winner Is… STL!

In addition to the previous GotW#45 tests executed for the ATL’s and STL’s strings, I wanted to try my own tests. So I developed some simple C++ code that builds a std::vector of strings, and then sorts that vector. This test is executed for both ATL’s CStringW and STL’s wstring. Both the insertion times and sorting times are recorded.

You can check it out on GitHub.

The code was compiled with VS2015, and executed on an Intel i7 workstation running Windows 10 64-bit.

As you can see from the following screenshots, the STL string class always wins (faster execution times). In particular, when the test is executed for tiny strings, the STL’s small string optimization (SSO) is a clear winner over ATL’s CString.

Testing ATL vs. STL String Performance
Testing ATL vs. STL String Performance
Testing ATL vs. STL String Performance (SSO)
Testing ATL vs. STL String Performance (SSO)

I recall I did some similar measurements with older VC++ compiler versions (probably VS2008) and ATL’s CString was a winner back then. C++11’s move semantics and other improvements in the MSVC compiler and in the STL implementation made an important difference here. Kudos to the Visual C++ Team’s compiler and STL library guys for the improvements!


The LGPL and Libraries Exposing a C++-Interface

The LGPL is a common open-source license for libraries. One of the key point of the LGPL is the separation of the LGPL-licensed library from the products using it; in particular, users should be able to supply their own versions of a given LGPL library to replace the one shipped with a software product.

If you provide the source code of your software product, then it’s kind of easy: users can recompile everything (i.e. your software’s source code and their own version of the LGPL library).

But what happens when you want to use an LGPL-licensed library in a closed-source commercial product?

Many developers say: “Just dynamically-link to the LGPL-library!” The idea here is to build the LGPL library as a DLL (on Windows), and to have your commercial software dynamically linking to that. So users can replace the DLL that ships with your commercial software with their own version of the DLL, built from their own version of the LGPL library. Unfortunately, this theory is not always true in practice.

In particular, let’s consider the case of an LGPL library written in C++, built as a DLL on Windows, and exposing a C++ interface that uses some STL classes. As we saw in a previous blog post, this is a highly-constraining design. For that to work properly, both the DLL and the calling EXE must be built with the same C++ compiler version, they must be both dynamically-linked to the same flavor of the CRT, etc. That’s very fragile.

So, in this context, just replacing the DLL that ships with the commercial product with a user-provided DLL won’t work in the general case, for example when the user-provided DLL is built with a different version of the Visual C++ compiler used for the EXE.

In other words, in this case the only way for users to replace the original DLL shipped with the product with their own version is to recompile everything, which does require access to the product’s source code (or at least to some parts of it).

So, in such cases, the LGPL doesn’t seem feasible for a commercial closed-source software product.

On the other hand, C-interface DLLs (designed with proper care for dynamically-allocated memory at the boundaries) are safely replaceable by users, without requiring recompilation of other modules dynamically-linked to them.

Disclaimer: I’m not a lawyer, and this post is not meant to give legal advice, but just to discuss a topic from a software developer’s perspective. As always, respectful constructive comments are welcome.


The Perils of C++-Interface DLLs

DLLs exposing a C++ interface are highly constraining. In fact, both the DLLs and the EXEs using them must be built with the same C++ compiler version, using the same settings (e.g. same _HAS_ITERATOR_DEBUGGING/_ITERATOR_DEBUG_LEVEL settings), and both the DLLs and the EXEs must be dynamically linked to the same flavor of the CRT.

For example, suppose that MyLib.DLL is a DLL exposing a C++ interface, and in particular, as part of this C++ interface, it exposes an STL class, for instance: a std::vector<int>.

Moreover, let’s suppose we have MyProgram.EXE, which uses the aforementioned MyLib.DLL. Initially both MyLib.DLL and MyProgram.EXE are built with VS2010, both dynamically linked to the VS2010’s C/C++ runtime. Everything’s fine.

Then, someone decides to improve the implementation code of MyLib.DLL, maybe using some new C++11 features, and this DLL gets rebuilt using VS2015. MyProgram.EXE is still the old VS2010-compiled executable. Now things start going wrong. What’s the matter with that?

Well, one of the problems here is that the “std::vector<int>” exposed at the interface by the VS2015-rebuilt MyLib.DLL is ­different from the “std::vector<int>” expected by the older MyProgram.EXE!

In fact, just considering the size in bytes of the VS2010’s “std::vector<int>” vs. the VS2015’s one, in a release build sizeof will return 16 (bytes) for the former, and 12 (bytes) for the latter. So, the std::vector’s size has changed from VS2010 to VS2015.

The reduction in size is an effect of an optimization that took place starting with VS2012: basically they avoided the storage of empty allocators for std::vector, so you end up with just three pointers in a release-mode std::vector. Considering 4 bytes for each pointer (in 32-bit builds), you end up with:

3 [pointers] *4 [bytes/pointer] = 12 bytes

which is optimally small for std::vector.

Anyway, the point is that just these two different std::vector’s sizes in VS2010 and VS2015 show that the binary layout of the std::vector template has changed between the two C++ compiler versions. So there is a clear mismatch between the std::vector expected by a DLL built with one version of the VC++ compiler and the std::vector expected by an EXE built with a different version of the VC++ compiler.

In this MSDN document titled “Breaking Changes in Visual C++ 2015”, it’s clearly written (emphasis mine):

“Standard Template Library

To enable new optimizations and debugging checks, the Visual Studio implementation of the C++ Standard Library intentionally breaks binary compatibility from one version to the next. Therefore, when the C++ Standard Library is used, object files and static libraries that are compiled by using different versions can’t be mixed in one binary (EXE or DLL), and C++ Standard Library objects can’t be passed between binaries that are compiled by using different versions.”

Note that even in case of the same VC++ compiler version, there are different std::vector’s sizes between debug builds vs. release builds. For example, with VS2015, “sizeof(std::vector<int>)” returns 16 (bytes) and 12 (bytes) in debug mode vs. release mode respectively. The size overhead in debug builds is due to some additional machinery (and consequent overhead) that helps spotting bugs in that building mode.

So, even when the same VC++ compiler is used, there are differences between std::vector’s layouts between debug builds and release builds; so, again, there’s a mismatch between the EXE’s and DLL’s expectations on a “std::vector<int>”.

There are a few options to increase the decoupling between DLLs and EXEs.

One option is to develop DLLs exposing a pure C interface. Of course, C++ can be used inside the DLL, in the implementation. But the interface must be pure C. Note that C++-specific features like exceptions must be caught inside DLL’s boundaries, and converted to something C-style like error return codes at the DLL’s boundaries. This approach is used by many Win32 APIs.

And in fact, for example, hypothetically assuming they built Windows 7 using some version of the Visual C++ 2008 compiler, we can call the Win32 APIs exposed by Windows 7 from C++ executables built using future versions of the MSVC compiler, like those shipping in Visual Studio 2010 or 2013, just to name a few. There’s no constraint for application developers in using the same C++ compiler used by the Windows Team to build the operating system, thanks to Win32 APIs exposing a pure-C interface.

Another option is to expose C++ abstract interfaces (i.e. C++ classes that contain only pure virtual methods and no data members) and C-interface helper functions, like factory functions. This is what basically COM does. And COM is another technology used by several important Windows subsystems, like DirectX.  So if you develop a COM DLL, it can be safely used by C++ executables built with different versions of the VC++ compiler.

Note: There are even other details to consider when building highly reusable software components in DLLs. For example, in the presence of dynamically-allocated objects exchanged between the DLL and the EXE, the exported component and all the modules using it must use the same memory allocator. In other words, the code that allocates memory and the code that frees it must use the same allocator. An option to solve this problem is to allocate and release memory invoking APIs like CoTaskMemAlloc() and CoTaskMemFree(), since both use a common memory allocator.


Is Copy-on-write Really a Pessimization Under Multithreading?

Copy-on-write (COW) is an optimization technique used by several C++ classes, including the ATL’s CString class, and many classes in the Qt framework.

However, it seems that the C++11 standard bans the use of COW in std::string.

Many developers consider COW a “pessimization” under multi-threading, and to support their argument they usually point to a piece written by Herb Sutter (GotW#45).

I’m an intellectually curious person, so I wanted to test that code on modern systems. I downloaded the original GotW#45 code, did some adjustments to make it compile cleanly with Visual Studio 2015 at warning level 4, changed some stuff like using Windows high-performance counters to measure time (instead of GetTickCount), and added a couple of tests for STL’s std::string and ATL’s CStringA.

You can download the modified code here from GitHub.

On a modern Intel i7-based workstation, the results for 100-char-length strings seem to show that CString (which is COW-based) actually performs better than std::string in those tests.

100-char STL string slower than COW ATL CString
100-char-length std::string’s performance worse than COW-based ATL CString.

However, when strings of shorter lengths are tested (e.g. 10 chars), std::string wins.

So, probably COW is not always a pessimization: there are cases in which the size of the data to copy can have a significant impact.

It’s also interesting that the fbstring class, which is a drop-in replacement for std::string, claiming significantly increased performance, uses COW for large strings.

fbstring uses COW for large strings
fbstring uses COW for large strings


P.S. Of course, I’m aware that are several other aspects to consider from a performance perspective, including data locality, etc. Still, I think these results are interesting.

A Laser-Focused Alternative to std::decay Using C++14’s Alias Templates

In the previous blog post, there was some code iterating through a generic container, using a range-for loop like so:

for (const auto& elem : c)

I wanted the C++ compiler to pick a specific traits class template specialization based on the current element’s type. Using just decltype(elem) turned out to be a bug, since the returned type was a const reference to the element’s type, but what I actually wanted was the original element’s type (not a const reference to it).

So, I used std::decay to strip off the “const &” part from the type returned by decltype(elem), leaving only the actual element’s type, such that the compiler could pick the proper traits class specialization based on that type.

However, std::decay performs also additional transformations, like those involving arrays and functions, which aren’t relevant in our case.

A more laser-focused alternative would be using std::remove_reference and std::remove_const (both defined in the <type_traits> standard header).

We could define a simple class template for that purpose:

#include <type_traits> // for remove_const, remove_reference

// Strip off "const &" from "const Type&",
// leaving only "Type".
template <typename T>
class RemoveConstReference
  // Strip off the reference part
  typedef typename 


  // Strip off the const,
  // after having stripped off the reference
  typedef typename 

And then that RemoveConstReference custom-defined template could be applied to strip off “const &” from elem’s type:


And finally the correct traits specialization is picked using code like this:

const int x = 

In addition, C++14 introduced some convenient alias templates. For example: besides C++11’s remove_reference, in C++14 a remove_reference_t alias template was defined:

template <typename T>
using remove_reference_t
  = typename remove_reference<T>::type;

This is basically syntactic sugar, a syntactic shortcut.

Similarly, there’s a C++14 remove_const_t alias template matching C++11’s remove_const.

Using those _t-ending alias templates, it’s possible to further simplify the code to strip off the “const&”, composing remove_const_t and remove_reference_t like this (“std::” prefix omitted for the sake of clarity):

typedef remove_const_t<

We could even define a custom reusable alias template just for removing “const &”:

template <typename T>
using RemoveConstReference_t = 

And then use it like so:

typedef RemoveConstReference_t<decltype(elem)> 

const int x = 

These C++14’s alias templates, which are just “syntactic sugar”, are nonetheless convenient to make C++ code simpler and more clear!


Mixing Traits Classes, Range-For Iteration and decltype: A Subtle Bug

Suppose you have defined a traits class to query a property for some type. This is a common technique used in C++: for example, to get the largest value of the type int, you can call std::numeric_limits<int>::max. std::numeric_limits is a class template that is specialized for arithmetic types like int, double, etc.

In the int specialization (numeric_limits<int>), its max method returns the largest value of the type int; in the double specialization (numeric_limits<double>), its max method returns the largest value of the type double, and so on. In generic code, when you have some arithmetic type T, you can query T’s largest value calling numeric_limits<T>::max.

So, you defined some traits class, which is a class template, that exposes some public static methods to query some properties for some types. For example:


template <typename Type>
struct Traits
  static int GetSomeProperty(const Type&)
    cout << "Generic"; 
    return 0;

(Of course, in production code, traits classes can contain more than a single method, just like the numeric_limits traits class exposes several member functions.)

Then you specialize the above traits class for the std::string type:

template <>
struct Traits<string>
  static int GetSomeProperty(const string&)
    cout << "String"; 
    return 1; 

This property might be the length in bytes of the input parameter, or it can be some other data associated to that parameter; the description of the particular property is unimportant here.

Then you have a function template that iterates through some generic container using a range-for loop and inside that loop, you invoke the Traits::GetSomeProperty method to query the given property of the current element:

template <typename Container>
void DoSomething(const Container & c)
  for (const auto& elem : c)
    cout << elem << ": ";

    int x = Traits<decltype(elem)>::

    cout << '\n';

Since this function operates on a generic container, you don’t know a priori the type of the elements in the container. So you think of using decltype(elem) to get the type of the current element.

Now, if you invoke that function template on a vector<string>, like this:

int main()
  vector<string> test{ "Hello", "Connie" };

What you would expect as output is probably:

Hello: String
Connie: String

Right? That’s because the DoSomething function template is iterating through a vector of strings, so decltype(elem) seems to be std::string, and so Traits<decltype(elem)> would pick Traits<string>, and consequently the Traits<string>::GetSomeProperty specialization would be invoked, which should print “String”.

Well, if you run this code, what you actually get is:

Hello: Generic
Connie: Generic

Why is that? Why is “Generic” printed instead of the expected “String”?

If pay attention to the range-for loop iteration:

    for (const auto& elem : c)

you’ll notice that elem’s type is actually not std::string. In fact, elem is a const reference to a std::string. That’s because of the “const auto&” iteration format, which in this particular case of vector<string> becomes “const std::string&”.

So, decltype(elem) is not std::string: it’s actually a “const std::string&”, that is: a const reference to a std::string. Since there is no Traits<T> specialization for a “const reference to a std::string” (in fact, you only have a specialization for a std::string), the generic Traits<T> class template code is picked by the compiler, so the generic Traits<T>::GetSomeProperty method is invoked, which prints “Generic”.

How can you fix this code? How to remove the “const &”, leaving only the “std::string” part?

Well, an option is to use std::decay:


This is std::string, without the const reference part.

A more verbose alternative to strip that off is remove_reference and remove_const (thanks Mr.STL for the tip on that one).

So, inside the range-for loop, this line will do The Right Thing (TM):

int x = 

If the above line seems too hard to parse, it’s possible to break it into pieces for better readability, with the help of a convenient intermediate typedef:

typedef std::decay<decltype(elem)>::type

int x = 

I hope this blog post will spare you some headache and debugging time if you have a range-for loop iterating with “const auto&”, and you are stuck getting the wrong (generic) traits invoked instead of the expected specialization.

Some compilable code is available here on GitHub for download and experimentation.

Enabling the StrSafe Locale Functions

The Windows SDK defines some string functions that provide special processing for buffer handling, with the goal of reducing security issues that involve buffer overruns. These functions are defined in the <StrSafe.h> header. If you are unfamiliar with them, a quick introduction can be found here.

Some of these functions include a parameter for locale information. These locale-aware StrSafe functions have an _l suffix, for example: StringCbPrintf_l.

However, if you try to use the aforementioned function in your Windows C++ code after including <StrSafe.h>, the compiler will complain with an error message like:

error C3861: ‘StringCbPrintf_l’: identifier not found

After some spelunking in the gigantic <StrSafe.h> header with the help of some search tool, you will discover that these locale-aware functions are excluded by default, and you have to explicitly enable them, #defining the preprocessor macro STRSAFE_LOCALE_FUNCTIONS.

StrSafe Locale Aware Functions Disabled by Default
StrSafe Locale-Aware Functions Disabled by Default

This doesn’t seem mentioned in the MSDN documentation for these functions (at least, I was unable to find a note about that).

I would have preferred a different policy of enabling them by default, and if for some reasons these functions would conflict with some existing code bases, those could be disabled defining a macro like STRSAFE_NO_LOCALE_FUNCTIONS (just like the existing approach of explicitly disabling functions with the STRSAFE_NO_CB_FUNCTIONS and STRSAFE_NO_CCH_FUNCTIONS macros).