Using STL Strings in ATL/WTL/MFC-Based C++ Code

Many C++ beginners (and not only beginners…) seem to struggle when dealing with STL’s strings in Win32 C++ code.

I wrote a detailed article published on MSDN Magazine on “Using STL Strings at Win32 API Boundaries”, which may be interesting or helpful for someone.

But here I’d like to discuss a specific slightly different scenario, that is the “interoperability” of STL’s strings with ATL or MFC-based code.

A common pattern in this context is having CString used as the default string class, instead of STL’s strings. For example, ATL, WTL and MFC’s classes use the CString class as their “native” string type.

Before moving forward, let me clarify that I’m going to discuss the case of Unicode builds, which have been the default since probably Visual Studio 2005. (“ANSI” builds are something of the past, and to me they don’t make much sense in modern C++ Windows software; they are also a big source of trouble and confusion between “ANSI” code page, several other different code pages, etc.).

In Unicode builds, CString represents a Unicode UTF-16 string. The STL’s equivalent (on the Windows platform with Visual Studio) is std::wstring.

A very common pattern in ATL/WTL/MFC-based C++ code is having:

  • Input strings passed as raw C-style NUL-terminated read-only string pointers, using the LPCTSTR Win32 typedef.
  • Output strings passed as non-const references to CString, i.e. CString&.

Let’s consider the input case first. The LPCTSTR typedef is equivalent to “const TCHAR*“.  In Unicode builds, TCHAR is equivalent to wchar_t. So, in the input case, in Unicode builds, the string is usually represented as a raw C-style NUL-terminated “const wchar_t*” pointer.

How can a std::wstring instance be passed to a “const wchar_t*” parameter? Simple: just call its c_str() method!

// void DoSomething(LPCTSTR inputString);

std::wstring s = /* Some string */;

// Pass std::wstring as an 
// input C-style NUL-terminated 
// wchar_t-based string
DoSomething( s.c_str() );

Now, let’s consider the CString& output case. Here, what I suggest is to simply create an instance of CString, pass it as the output string parameter, and then just convert the returned CString to a std::wstring. In code:

// void DoSomething(CString& outputString);

// Just follow the method's prototype literally,
// and pass a CString instance that will be filled
// with the returned string.
CString cs;
DoSomething(cs);

// Convert from CString to std::wstring
std::wstring ws(cs);

// Now use the wstring ...

The last line converting from CString to std::wstring works since CString has an implicit conversion operator to LPCTSTR, which in Unicode builds is equivalent to “const wchar_t*”. So, CString is happy to be automatically converted to a “const wchar_t*”, i.e. a “raw C-style NUL-terminated wchar_t-based read-only string pointer”.

On the other side, std::wstring has an overloaded constructor expecting exactly a “const wchar_t*”, i.e. a “raw C-style NUL-terminated wchar_t-based read-only string pointer”, so there’s a perfect match here!

This conversion code can be optimized. In fact, for the previous conversion, std::wstring needs to know the exact length of the input string (i.e. its wchar_t count), and to do so it would typically call an strlen-like function that works for wchar_t-based strings. This is typically a O(N) operation. But a CString already knows its length: it’s bookmarked in the CString class and the CString::GetLength() method will return it instantly in O(1)! Considering that std::wstring has another overloaded constructor expecting a pointer and a length (i.e. wchar_t-count), we can combine these pieces of information building a convenient simple and efficient conversion function from CString to wstring:

inline std::wstring ToWString(const ATL::CStringW& s)
{
  if (!s.IsEmpty())
  {
    return std::wstring(s, s.GetLength());
  }
  else
  {
    return std::wstring();
  }
}

(I explicitly used the more specific CStringW  class in the aforementioned code snippet, but you can freely use CString in Unicode builds. In fact, in Unicode builds, CString is equivalent to CStringW.)

P.S. This blog post discussed the specific Unicode UTF-16 case. If you want to use the STL’s std::string class, you can store Unicode text in it using UTF-8 encoding. In this case, conversions between UTF-16 and UTF-8 (for std::string) are required. This will be discussed in a future article.

EDIT (2016, September 12):  Conversions between Unicode UTF-16 and UTF-8 (for std::string) are discussed in detail in this MSDN Magazine article of mine: “Unicode Encoding Conversions with STL Strings and Win32 APIs”.

 

Leave a Reply

Your email address will not be published. Required fields are marked *