Adventures with string_view: Optimizing Code with O(1) Operations

Last time we saw that you can invoke the CString::GetString method to get a C-style null-terminated const string pointer, then pass it to functions that take wstring_view input parameters:

// name is a CString instance;
// DoSomething takes a wstring_view input parameter
DoSomething(name.GetString());

While this code works fine, it’s possible to optimize it.

As the saying goes, first make things work, then make things fast.

The Big Picture

A typical implementation of wstring_view holds two data members: a pointer to the string characters, and a size (or length). Basically, the pointer indicates where the string starts, and the size/length specifies how many consecutive characters belong to the string view (note that string views are not necessarily null-terminated).

The above code invokes a wstring_view constructor overload that takes a null-terminated C-style string pointer. To get the size (or length) of the string view, the implementation code needs to traverse the input string’s characters one by one, until it finds the null terminator. This is a linear time operation, or O(N) operation.

Fortunately, there’s another wstring_view constructor overload, that takes two parameters: a pointer and a length. Since CString objects know their own length, you can invoke the CString::GetLength method to get the value of the length parameter.

// Create a wstring_view from CString (*)
DoSomething({ name.GetString(), name.GetLength() });

The great news is that CString objects bookkeep their own string length, so that CString::GetLength doesn’t have to traverse all the string characters until it finds the terminating null. The value of the string length is already available when you invoke the CString::GetLength method.

In other words, creating a string view invoking CString::GetString and CString::GetLength replaces a linear time O(N) operation with a constant time O(1) operation, which is great.

Fixing and Refining the Code

When you try to compile the above code marked with (*), the C++ compiler actually complains with the following message:

Error C2398 Element '2': 
conversion from 'int' 
to 'const std::basic_string_view<wchar_t,std::char_traits<wchar_t>>::size_type' 
requires a narrowing conversion

The problem here is that CString::GetLength returns an int, which doesn’t match with the size type expected by wstring_view. Well, not a big deal: We can safely cast the int value returned by CString::GetLength to wstring_view::size_type, or just size_t:

DoSomething({ name.GetString(), 
    static_cast<size_t>(name.GetLength()) });

As a further refinement, we can wrap the above wstring_view-from-CString creation code in a nice helper function:

// Helper function that *efficiently* creates a wstring_view to a CString
inline std::wstring_view AsView(const CString& str)
{
    return { str.GetString(), 
             static_cast<size_t>(str.GetLength()) };
}

Measuring the Performance Gain

Using the above helper function, you can safely and efficiently create a string view to a CString object.

Now, you may ask, how much speed gain are we talking here?

Good question!

I have developed a small simple C++ benchmark, which shows that replacing a O(N) operation with a O(1) operation in this case gives a performance boost of 88 ms vs. 445 ms, which is an 80% performance gain! Wow.

Benchmark comparing two wstring_view constructor overloads
Benchmark comparing two wstring_view constructor overloads

If you want to learn more about Big-O notation and other related topics, you can watch my Pluralsight course on Introduction to Data Structures and Algorithms in C++.

Big-O doesn’t have to be boring!

Adventures with string_view: Making string_view Play Well with ATL/MFC CString

Suppose that you have a C++ code base containing methods and functions that take ATL/MFC CString objects as input parameters, for example:

void DoSomething(const CString& s)

and you want to introduce the use of std::string_view in that code.

Let’s assume that your code base is built using Visual Studio in Unicode mode (Configuration Properties | Advanced | Character Set set to Use Unicode Character Set), which has been the default since Visual Studio 2005. In this case, CString is actually the wchar_t-based CStringW, and the matching std::basic_string_view is the wchar_t-based std::wstring_view.

(Note: It could be possible to convert between Unicode UTF-16 CStringW and UTF-8, and store the UTF-8 encoded text in a std::string object, and take a std::string_view on that, but let’s not make things even more complicated!)

So, you may think of adding a wstring_view overload like this:

void DoSomething(std::wstring_view s)

Unfortunately, that would cause a problem with your pre-existing perfectly compiling code. In fact, if you pass a string literal, like this:

DoSomething(L"Connie");

the C++ compiler now has two options for DoSomething: the initial form that takes a const CString&, and the new one that takes a wstring_view. The C++ compiler has no way to choose which overload to pick, so it emits an error complaining about the ambiguous call to the DoSomething overloaded function.

At this point, you may think of removing the previous CString overload, and only keep the new wstring_view version:

// Removed: void DoSomething(const CString&)
void DoSomething(std::wstring_view s)

OK, now the C++ compiler is happy as there is only one function to pick.

So, problem solved? Nope.  Don’t forget that this is C++! 😊 And things can get more “interesting” when you apparently solved one problem.

For example, in your existing code base you almost certainly have a number CString objects around. For the sake of simplicity, consider this code snippet:

CString name;
DoSomething(name);

This code worked perfectly fine before you added wstring_view, but now the compiler complains! The problem this time is that the C++ compiler cannot find a suitable way to convert from CString to wstring_view. How can you fix that?

Well, you can invoke the CString::GetString method. In fact, this method returns a const wchar_t* raw pointer to the NUL-terminated C-style string associated with the CString object. And wstring_view has a constructor overload that matches this particular case, constructing a view of the NUL-terminated character string passed as input pointer parameter.

This will work:

// name is a CString
DoSomething(name.GetString());

Another important consideration to make is that wstring_view is just a view to a string, so you must pay attention that the pointed-to string is valid for at least all the time you are referencing it via the wstring_view. In other words, pay attention to dangling references and string views that refer to strings that have been deallocated or moved elsewhere in memory.

 

Unicode Encoding Conversions Between ATL and STL Strings

At the Windows platform-specific level, when using C++ frameworks like ATL (or MFC), it makes sense to represent strings using CString (CStringW in Unicode builds, which should be the default in modern Windows C++ code).

On the other hand, in cross-platform C++ code, using std::string makes sense as well.

The same Unicode text can be encoded in UTF-16 when stored in CString(W), and in UTF-8 for std::string.

So, there’s a need to convert between those two Unicode representations. I discussed the details of such conversions in my MSDN Magazine article: “Unicode Encoding Conversions with STL Strings and Win32 APIs”However, the C++ code associated to that article used std::wstring for UTF-16 strings.

I’ve created a new repository on GitHub, where I uploaded reusable code (in the form of a convenient header-only module) for converting between UTF-16 and UTF-8, using CStringW for UTF-16, and std::string for UTF-8. Please feel free to check it out!

 

Open-Sourcing MFC: What Do You Think?

There is an idea on Visual Studio User Voice about open-sourcing MFC.

What do you think about it? Would it be good or bad for the development of MFC and for the MFC community?

For new C++ projects there are several options available. For example: using Qt to develop cross-platform C++ applications, or WTL for something more Windows-specific, just to name a few.

But MFC is still used in several C++ legacy projects (including very large enterprise applications) still active in maintenance mode. With MFC open source, the community and Microsoft could submit new code and patches while Microsoft could monitor the quality of the submitted code and support an official branch.

My original Visual C++ 6 box.
My original Visual C++ 6 box (…memories from the MFC’s “Golden Days” 🙂 )

Please feel free to share your opinions on Visual Studio User Voice, or leave comments here.

Please try to develop a civilized discussion, and be courteous and kind even when you disagree.