Adventures with string_view: Optimizing Code with O(1) Operations

Last time we saw that you can invoke the CString::GetString method to get a C-style null-terminated const string pointer, then pass it to functions that take wstring_view input parameters:

// name is a CString instance;
// DoSomething takes a wstring_view input parameter
DoSomething(name.GetString());

While this code works fine, it’s possible to optimize it.

As the saying goes, first make things work, then make things fast.

The Big Picture

A typical implementation of wstring_view holds two data members: a pointer to the string characters, and a size (or length). Basically, the pointer indicates where the string starts, and the size/length specifies how many consecutive characters belong to the string view (note that string views are not necessarily null-terminated).

The above code invokes a wstring_view constructor overload that takes a null-terminated C-style string pointer. To get the size (or length) of the string view, the implementation code needs to traverse the input string’s characters one by one, until it finds the null terminator. This is a linear time operation, or O(N) operation.

Fortunately, there’s another wstring_view constructor overload, that takes two parameters: a pointer and a length. Since CString objects know their own length, you can invoke the CString::GetLength method to get the value of the length parameter.

// Create a wstring_view from CString (*)
DoSomething({ name.GetString(), name.GetLength() });

The great news is that CString objects bookkeep their own string length, so that CString::GetLength doesn’t have to traverse all the string characters until it finds the terminating null. The value of the string length is already available when you invoke the CString::GetLength method.

In other words, creating a string view invoking CString::GetString and CString::GetLength replaces a linear time O(N) operation with a constant time O(1) operation, which is great.

Fixing and Refining the Code

When you try to compile the above code marked with (*), the C++ compiler actually complains with the following message:

Error C2398 Element '2': 
conversion from 'int' 
to 'const std::basic_string_view<wchar_t,std::char_traits<wchar_t>>::size_type' 
requires a narrowing conversion

The problem here is that CString::GetLength returns an int, which doesn’t match with the size type expected by wstring_view. Well, not a big deal: We can safely cast the int value returned by CString::GetLength to wstring_view::size_type, or just size_t:

DoSomething({ name.GetString(), 
    static_cast<size_t>(name.GetLength()) });

As a further refinement, we can wrap the above wstring_view-from-CString creation code in a nice helper function:

// Helper function that *efficiently* creates a wstring_view to a CString
inline std::wstring_view AsView(const CString& str)
{
    return { str.GetString(), 
             static_cast<size_t>(str.GetLength()) };
}

Measuring the Performance Gain

Using the above helper function, you can safely and efficiently create a string view to a CString object.

Now, you may ask, how much speed gain are we talking here?

Good question!

I have developed a small simple C++ benchmark, which shows that replacing a O(N) operation with a O(1) operation in this case gives a performance boost of 88 ms vs. 445 ms, which is an 80% performance gain! Wow.

Benchmark comparing two wstring_view constructor overloads
Benchmark comparing two wstring_view constructor overloads

If you want to learn more about Big-O notation and other related topics, you can watch my Pluralsight course on Introduction to Data Structures and Algorithms in C++.

Big-O doesn’t have to be boring!

Adventures with string_view: Making string_view Play Well with ATL/MFC CString

Suppose that you have a C++ code base containing methods and functions that take ATL/MFC CString objects as input parameters, for example:

void DoSomething(const CString& s)

and you want to introduce the use of std::string_view in that code.

Let’s assume that your code base is built using Visual Studio in Unicode mode (Configuration Properties | Advanced | Character Set set to Use Unicode Character Set), which has been the default since Visual Studio 2005. In this case, CString is actually the wchar_t-based CStringW, and the matching std::basic_string_view is the wchar_t-based std::wstring_view.

(Note: It could be possible to convert between Unicode UTF-16 CStringW and UTF-8, and store the UTF-8 encoded text in a std::string object, and take a std::string_view on that, but let’s not make things even more complicated!)

So, you may think of adding a wstring_view overload like this:

void DoSomething(std::wstring_view s)

Unfortunately, that would cause a problem with your pre-existing perfectly compiling code. In fact, if you pass a string literal, like this:

DoSomething(L"Connie");

the C++ compiler now has two options for DoSomething: the initial form that takes a const CString&, and the new one that takes a wstring_view. The C++ compiler has no way to choose which overload to pick, so it emits an error complaining about the ambiguous call to the DoSomething overloaded function.

At this point, you may think of removing the previous CString overload, and only keep the new wstring_view version:

// Removed: void DoSomething(const CString&)
void DoSomething(std::wstring_view s)

OK, now the C++ compiler is happy as there is only one function to pick.

So, problem solved? Nope.  Don’t forget that this is C++! 😊 And things can get more “interesting” when you apparently solved one problem.

For example, in your existing code base you almost certainly have a number CString objects around. For the sake of simplicity, consider this code snippet:

CString name;
DoSomething(name);

This code worked perfectly fine before you added wstring_view, but now the compiler complains! The problem this time is that the C++ compiler cannot find a suitable way to convert from CString to wstring_view. How can you fix that?

Well, you can invoke the CString::GetString method. In fact, this method returns a const wchar_t* raw pointer to the NUL-terminated C-style string associated with the CString object. And wstring_view has a constructor overload that matches this particular case, constructing a view of the NUL-terminated character string passed as input pointer parameter.

This will work:

// name is a CString
DoSomething(name.GetString());

Another important consideration to make is that wstring_view is just a view to a string, so you must pay attention that the pointed-to string is valid for at least all the time you are referencing it via the wstring_view. In other words, pay attention to dangling references and string views that refer to strings that have been deallocated or moved elsewhere in memory.

 

C++ String Benchmark: STL vs. ATL vs. Custom Pool Allocator

I was curious to compare the performance of the STL string implementation versus ATL CString, using Visual Studio 2019, so I wrote some simple C++ benchmark code for this purpose.

I also added into the mix a custom string pool allocator based on some code from The Old New Thing blog, that I modified and bug-fixed. This allocator basically maintains a singly-linked list of chunks, and string memory is carved from each chunk just increasing a string pointer. When there isn’t enough memory in the current chunk to serve the allocation, a new chunk is allocated. The new chunk is safely linked to the previous chunk list, and the memory for the requested strings is carved from this new chunk. The linked list of chunks is traversed at destruction time to properly release the allocated memory blocks.

I measured the times to create and fill string vectors, and the times to sort the same vectors.

TL;DR: The STL string performance is great! You can improve creation times with a custom pool allocator.

These times are measured for vectors storing each kind of strings: STL’s wstring, ATL’s CString (i.e. CStringW in Unicode builds), and the strings created using the custom string pool allocator.

This is a sample run (executed on a Windows 10 64-bit Intel i7 PC):

String benchmark: STL vs. ATL vs. custom string pool allocator
String benchmark: STL vs. ATL vs. custom string pool allocator

As you can note, the best creation times are obtained with the custom string pool allocator. This was expected, as the allocation strategy of carving string memory from pre-allocated blocks is very efficient.

On the other hand, regarding the sorting times, STL and the custom pool strings perform very similarly.

ATL’s CString, which is based on CoW (Copy on Write), shows the worst times for both creation and sorting.

Benchmark Variation: Tiny Strings

It’s also possible to run the benchmark with short strings, triggering the SSO (compile the code #define’ing TEST_TINY_STRINGS).

String benchmark with tiny strings: STL vs. ATL vs. custom string pool allocator
String benchmark with tiny strings: STL vs. ATL vs. custom string pool allocator

As you can see in this case, thanks to the SSO, STL strings win by an important margin in both creation and sorting times.

Note: Making the Sorting Comparison Uniform

I’d also like to clarify that, to make the sorting comparison more uniform, I used custom comparator functions with std::sort, invoking wcscmp for each kind of strings. In fact, spelunking in the VS 2019 STL and ATL implementation code (Thank You, Step Into Specific menu), I found that std::wstring invokes wmemcmp (file <xstring>, line #235), while ATL::CString uses wcscmp (file <cstringt.h>, line #564). Probably std::wstring uses wmemcmp because wstring instances can contain embedded NULs.

 

The Case of the Context-Menu Shell Extension That Doesn’t Get Invoked by Explorer

A developer was working on a context-menu shell extension.

He wrote all the skeleton infrastructure code, including some code to display the selected files when the shell extension gets invoked by Windows Explorer. It’s kind of the “Hello World” for context-menu shell extensions.

He happily builds his code within Visual Studio, copies the shell extension DLL to a virtual machine, registers the extension in the VM, right-clicks on a bunch of files… but nothing happens. His extension’s menu items just don’t show up in the Explorer context-menu.

What’s going wrong, he thought?

He starts thinking to all sorts of hypotheses and troubleshooting strategies.

He once again manually registers the COM DLL that implements the shell extension from the Command Prompt:

regsvr32 CoolContextMenuExtension.dll

A message box pops up, informing him that the registration has succeeded.

He tries right-clicking some files, but, once again, his extension’s menu items don’t show up.

He then checks that the extension was properly registered under the list of approved shell extensions (this was part of his shell extension infrastructure C++ code):

HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Shell Extensions\Approved

He finds the GUID of the shell extension in there, as expected.

He then checks that the extension was registered under the proper subkey in HKEY_CLASSES_ROOT. Found it there, too!

He starts wondering… what may be going so wrong?

He checks the C++ infrastructure code generated by the ATL Visual Studio wizard, including his own additions and modifications. The implementations of DllRegisterServer and DllUnregisterServer seem correct.

He also checks his own C++ code in the C++ class that implements the shell extension’s COM object. He has correctly implemented IShellExtInit::Initialize, and the three methods of IContextMenu: GetCommandString, InvokeCommand, and QueryContextMenu.

He takes a look at the class inheritance list: both IShellExtInit and IContextMenu are listed among the base classes.

The IShellExtInit and IContextMenu COM interfaces are correctly listed among the base classes.
The IShellExtInit and IContextMenu COM interfaces are correctly listed among the base classes.

He feels desperate. What’s going wrong?? He wrote several shell extensions in C++ in the past. Maybe there’s some bug in the current version of Visual Studio 2019 he is using?

Why wasn’t his context-menu shell extension getting called by Explorer?

 

I took a look at that code.

At some point, I was enlightened.

I gave a look at the COM interface map in the header file of the C++ shell extension class.

There’s a bug in the shell extension C++ object’s COM interface map.
There’s a bug in the shell extension C++ object’s COM interface map.

Wow! Can you spot the error?

Something is missing in there!

In fact, in addition to deriving the C++ shell extension class from IContextMenu, you also have to add an entry for IContextMenu in the COM map!

I quickly added the missing line:

    COM_INTERFACE_ENTRY(IContextMenu)

The COM interface map, with both the IShellExtInit and IContextMenu entries.
The COM interface map, with both the IShellExtInit and IContextMenu entries.

I rebuilt the extension, registered it, and, with great pleasure and satisfaction, the new custom items in the Explorer’s context-menu showed up! And, after selecting a bunch of files to test the extension, the message box invoked by the shell extension C++ code was correctly shown.

All right! 😊

So, what was going wrong under the hood?

Basically, since the context-menu shell extension was correctly registered under the proper key in the Windows Registry, I think that Windows Explorer was actually trying to invoke the shell extension’s methods.

In particular, I think Explorer called QueryInterface on the shell extension’s COM object, to get a pointer to IContextMenu. But, since IContextMenu was missing from the COM map, the QueryInterface code implemented by the ATL framework was unable to return the interface pointer back to Explorer. As a consequence of that, Explorer didn’t recognize the extension as a valid context-menu extension (despite the extension being correctly registered in the Windows Registry), and didn’t even call the IShellExtInit::Initialize method (that was actually available, as IShellExtInit had its own entry correctly listed in the COM map from the beginning).

The bug is in the details!

So, the moral of the story is to always check the COM map in addition to the base class list in your shell extension’s C++ class header file. The COM interfaces need to both be in the base class list and have their own entries in the COM map.

And, in addition, to me this experience has suggested that it would have been great if Visual Studio had issued at least a warning for having the IContextMenu COM interface listed among the base classes, but missing from the COM map! This would be an excellent time-and-bug-saving addition, Visual Studio IDE/Visual C++/Visual Assist X teams!

 

ATL::CStringW vs. std::wstring Performance: String Sorting and Comparisons

According to previous measurements, std::wstring performs better than ATL::CStringW:

  1. in all string concatenation tests
  2. when sorting string vectors that are made by small strings, thanks to std::basic_string’s SSO

So, I focused my attention on the string vector sorting scenario, and it seems to me that (at least in the VS2015 implementation), the slowdown of (non-SSO) wstrings is caused by wmemcmp calls, that are used to compare wstrings when sorting the string vectors. On the other hand, CStringW invokes wcscmp, that seems to run faster.

In fact, invoking std::sort with a custom comparator function that calls wcscmp to compare the C-style pointers returned by wstring::c_str, results in faster vector<wstring> sorting times. So, in this case wstring sorting performs better than CStringW even for non-SSO strings.

However, as Stephan T. Lavavej pointed out, std::basic_string supports embedded nulls, so wstring cannot use wcscmp (that works only for C-style null-terminated strings, without embedded nulls).

 

String Performance Tests: ATL vs. STL

I wrote some C++ code to test the performance of the ATL CStringW class vs. the C++ Standard Library’s std::wstring.

There are several aspects that can be considered when comparing string class performance: In the aforementioned code, I tested string vector sorting and string concatenation.

The code is available in this repository on GitHub.

You can take a look at the README for further details (including my test results).

 

The CStringW with wcout Bug Under the Hood

I discussed in a previous blog post a subtle bug involving CStringW and wcout, and later I showed how to fix it.

In this blog post, I’d like to discuss in more details what’s happening under the hood, and what triggers that bug.

Well, to understand the dynamics of that bug, you can consider the following simplified case of a function and a function template, implemented like this:

void f(const void*) {
  cout << "f(const void*)\n";
}

template <typename CharT> 
void f(const CharT*) {
  cout << "f(const CharT*)\n";
}

If s is a CStringW object, and you write f(s), which function will be invoked?

Well, you can write a simple compilable code containing these two functions, the required headers, and a simple main implementation like this:

int main() {
  CStringW s = L"Connie";
  f(s);
}

Then compile it, and observe the output. You know, printf-debugging™ is so cool! 🙂

Well, you’ll see that the program outputs “f(const void*)”. This means that the first function (the non-templated one, taking a const void*), is invoked.

So, why did the C++ compiler choose that overload? Why not f(const wchar_t*), synthesized from the second function template?

Well, the answer is in the rules that C++ compilers follow when doing template argument deduction. In particular, when deducing template arguments, the implicit conversions are not considered. So, in this case, the implicit CStringW conversion to const wchar_t* is not considered.

So, when overload resolution happens later, the only candidate available is f(const void*). Now, the implicit CStringW conversion to const wchar_t* is considered, and the first function is invoked.

Out of curiosity, if you comment out the first function, you’ll get a compiler error. MSVC complains with a message like this:

error C2672: ‘f’: no matching overloaded function found

error C2784: ‘void f(const CharT *)’: could not deduce template argument for ‘const CharT *’ from ‘ATL::CStringW’

The message is clear (almost…): “Could not deduce template argument for const CharT* from CStringW”: that’s because implicit conversions like this are not considered when deducing template arguments.

Well, what I’ve described above in a simplified case is basically what happens in the slightly more complex case of wcout.

wcout is an instance of wostream. wostream is declared in <iosfwd> as:

typedef basic_ostream<wchar_t, char_traits<wchar_t>> wostream;

Instead of the initial function f, in this case you have operator<<. In particular, here the candidates are an operator<< overload that is a member function of basic_ostream:

basic_ostream& basic_ostream::operator<<(const void *_Val)

and a template non-member function:

template<class _Elem, class _Traits> 
inline basic_ostream<_Elem, _Traits>& 
operator<<(basic_ostream<_Elem, _Traits>& _Ostr, const _Elem *_Val)

(This code is edited from the <ostream> standard header that comes with MSVC.)

When you write code like “wcout << s” (for a CStringW s), the implicit conversion from CStringW to const wchar_t* is not considered during template argument deduction. Then, overload resolution picks the basic_ostream::operator<<(const void*) member function (corresponding to the first f in the initial simplified case), so the string’s address is printed via this “const void*” overload (instead of the string itself).

On the other hand, when CStringW::GetString is explicitly invoked (as in “wcout << s.GetString()”), the compiler successfully deduces the template arguments for the non-member operator<< (deducing wchar_t for _Elem). And this operator<<(wostream&, const wchar_t*) prints the expected wchar_t string.

I know… There are aspects of C++ templates that are not easy.

 

Wiring CStringW with Output Streams

We saw earlier that there’s a subtle bug involving CStringW and wcout.

If you really want to make CStringW work with wcout without the additional GetString call, you can follow a common pattern that is used to enable a C++ class to work with output streams.

In particular, you can define an overload of operator<<, that takes references to the output stream and to the class of interest, e.g.:

std::wostream& operator<<(std::wostream& os, 
                          const CStringW& str)
{
    return (os << str.GetString());
}

Note that the call to CString::GetString is hidden inside the implementation of this operator<< overload.

With this overload, the following code will output the expected string:

CStringW s = L"Connie";
wcout << s;

Note that this will work also for other wcout-ish objects (output streams), like wostringstream.

 

Subtle Bug with CStringW and wcout: Where’s My String??

Someone wrote some C++ code like this to print the content of a CStringW using wcout:

CStringW s = L"Connie";
…
wcout << s << …

The code compiles fine, but the output is a hexadecimal sequence, not the string “Connie” as expected. Surprise, surprise! So, what’s going on here?

Well, wcout doesn’t have any clue on how to deal with CStringW objects. All in all, CStringW is part of ATL; instead wcout is part of the <iostream>: they are two separate worlds.

However, CStringW defines an implicit conversion operator to const wchar_t*. This makes it possible to simply pass CStringW objects to Win32 APIs expecting input C-style NUL-terminated string pointers (although, there’s a school of thought that discourages the use of implicit conversions in C++).

So, wcout has no idea how to print a CStringW object. However, wcout does know how to print raw pointers (const void*). So, in the initial code snippet, the C++ compiler first invokes the CStringW implicit const wchar_t* conversion operator. Then, the <iostream> operator<< overload that takes a const void* parameter is used to print the pointer with wcout.

In other words, the hexadecimal value printed is the raw C-style string pointer to the string wrapped by the CStringW object, instead of the string itself.

If you want to print the actual string (not the pointer), you can invoke the GetString method of CStringW. GetString returns a const wchar_t*, which is the pointer to the wrapped string; wcout does know how to print a Unicode UTF-16 wchar_t string via its raw C-style pointer. In fact, there’s a specific overload of operator<< that takes a const wchar_t*; this gets invoked, for example, when you pass wchar_t-string literals to wcout, e.g. wcout << L”Connie”, instead of picking the const void* overload that prints the pointer.

So, this is the correct code:

// Print a CStringW to wcout
wcout << s.GetString() << …

Another option is to explicitly static_cast the CStringW object to const wchar_t*, so the proper operator<< overload gets invoked:

wcout << static_cast<const wchar_t*>(s) << …

Although I prefer the shorter and clearer GetString call.

(Of course, this applies also to other wcout-ish objects, like instances of std::wostringstream.)

P.S. In my view, it would be more reasonable that “wcout << …” picked the const wchar_t* overload also for CStringW objects. It probably doesn’t happen for some reason involving templates or some lookup rule in the I/O stream library. Sometimes C++ is not reasonable (it happens also to C).

Safe Array Sample Code on GitHub

I uploaded some sample code on GitHub, that shows how to create safe arrays in C++ and consume them in a C# application.

This project contains a C++ DLL (with a C interface) that exports two functions to produce safe arrays containing bytes and strings.

Then, there’s a WinForms C# application that consumes these safe arrays using proper PInvoke declarations, and shows their content on screen.