Pluralsight 33% Off (Limited Time)

Just a heads up to let you know that Pluralsight is offering a limited-time (until 5/12) 33% off on Annual Standard and Premium subscriptions.

Click the banner below to save now!

Pluralsight FREE Week

Just a heads up to let you know that Pluralsight is unlocking its technology skills platform and making 7,000+ expert-led video courses, 40+ interactive courses and 20+ projects FREE for one week only (April 25th – May 1st).

This includes my courses on learning modern C++ from scratch, practical C++14 and C++17 features, introduction to data structures and algorithms in C++, getting started with the C language, and more.

Happy learning!

Adventures with string_view: Optimizing Code with O(1) Operations

Last time we saw that you can invoke the CString::GetString method to get a C-style null-terminated const string pointer, then pass it to functions that take wstring_view input parameters:

// name is a CString instance;
// DoSomething takes a wstring_view input parameter
DoSomething(name.GetString());

While this code works fine, it’s possible to optimize it.

As the saying goes, first make things work, then make things fast.

The Big Picture

A typical implementation of wstring_view holds two data members: a pointer to the string characters, and a size (or length). Basically, the pointer indicates where the string starts, and the size/length specifies how many consecutive characters belong to the string view (note that string views are not necessarily null-terminated).

The above code invokes a wstring_view constructor overload that takes a null-terminated C-style string pointer. To get the size (or length) of the string view, the implementation code needs to traverse the input string’s characters one by one, until it finds the null terminator. This is a linear time operation, or O(N) operation.

Fortunately, there’s another wstring_view constructor overload, that takes two parameters: a pointer and a length. Since CString objects know their own length, you can invoke the CString::GetLength method to get the value of the length parameter.

// Create a wstring_view from CString (*)
DoSomething({ name.GetString(), name.GetLength() });

The great news is that CString objects bookkeep their own string length, so that CString::GetLength doesn’t have to traverse all the string characters until it finds the terminating null. The value of the string length is already available when you invoke the CString::GetLength method.

In other words, creating a string view invoking CString::GetString and CString::GetLength replaces a linear time O(N) operation with a constant time O(1) operation, which is great.

Fixing and Refining the Code

When you try to compile the above code marked with (*), the C++ compiler actually complains with the following message:

Error C2398 Element '2': 
conversion from 'int' 
to 'const std::basic_string_view<wchar_t,std::char_traits<wchar_t>>::size_type' 
requires a narrowing conversion

The problem here is that CString::GetLength returns an int, which doesn’t match with the size type expected by wstring_view. Well, not a big deal: We can safely cast the int value returned by CString::GetLength to wstring_view::size_type, or just size_t:

DoSomething({ name.GetString(), 
    static_cast<size_t>(name.GetLength()) });

As a further refinement, we can wrap the above wstring_view-from-CString creation code in a nice helper function:

// Helper function that *efficiently* creates a wstring_view to a CString
inline std::wstring_view AsView(const CString& str)
{
    return { str.GetString(), 
             static_cast<size_t>(str.GetLength()) };
}

Measuring the Performance Gain

Using the above helper function, you can safely and efficiently create a string view to a CString object.

Now, you may ask, how much speed gain are we talking here?

Good question!

I have developed a small simple C++ benchmark, which shows that replacing a O(N) operation with a O(1) operation in this case gives a performance boost of 88 ms vs. 445 ms, which is an 80% performance gain! Wow.

Benchmark comparing two wstring_view constructor overloads
Benchmark comparing two wstring_view constructor overloads

If you want to learn more about Big-O notation and other related topics, you can watch my Pluralsight course on Introduction to Data Structures and Algorithms in C++.

Big-O doesn’t have to be boring!

Adventures with string_view: Making string_view Play Well with ATL/MFC CString

Suppose that you have a C++ code base containing methods and functions that take ATL/MFC CString objects as input parameters, for example:

void DoSomething(const CString& s)

and you want to introduce the use of std::string_view in that code.

Let’s assume that your code base is built using Visual Studio in Unicode mode (Configuration Properties | Advanced | Character Set set to Use Unicode Character Set), which has been the default since Visual Studio 2005. In this case, CString is actually the wchar_t-based CStringW, and the matching std::basic_string_view is the wchar_t-based std::wstring_view.

(Note: It could be possible to convert between Unicode UTF-16 CStringW and UTF-8, and store the UTF-8 encoded text in a std::string object, and take a std::string_view on that, but let’s not make things even more complicated!)

So, you may think of adding a wstring_view overload like this:

void DoSomething(std::wstring_view s)

Unfortunately, that would cause a problem with your pre-existing perfectly compiling code. In fact, if you pass a string literal, like this:

DoSomething(L"Connie");

the C++ compiler now has two options for DoSomething: the initial form that takes a const CString&, and the new one that takes a wstring_view. The C++ compiler has no way to choose which overload to pick, so it emits an error complaining about the ambiguous call to the DoSomething overloaded function.

At this point, you may think of removing the previous CString overload, and only keep the new wstring_view version:

// Removed: void DoSomething(const CString&)
void DoSomething(std::wstring_view s)

OK, now the C++ compiler is happy as there is only one function to pick.

So, problem solved? Nope.  Don’t forget that this is C++! 😊 And things can get more “interesting” when you apparently solved one problem.

For example, in your existing code base you almost certainly have a number CString objects around. For the sake of simplicity, consider this code snippet:

CString name;
DoSomething(name);

This code worked perfectly fine before you added wstring_view, but now the compiler complains! The problem this time is that the C++ compiler cannot find a suitable way to convert from CString to wstring_view. How can you fix that?

Well, you can invoke the CString::GetString method. In fact, this method returns a const wchar_t* raw pointer to the NUL-terminated C-style string associated with the CString object. And wstring_view has a constructor overload that matches this particular case, constructing a view of the NUL-terminated character string passed as input pointer parameter.

This will work:

// name is a CString
DoSomething(name.GetString());

Another important consideration to make is that wstring_view is just a view to a string, so you must pay attention that the pointed-to string is valid for at least all the time you are referencing it via the wstring_view. In other words, pay attention to dangling references and string views that refer to strings that have been deallocated or moved elsewhere in memory.

 

Open Source: Who Pays for It?

Recently I worked on my C++ WinReg library for a couple of hours, adding some useful methods, testing them, and packaging the library for vcpkg distribution.

At a reasonable rate of $150/hour, that would be $300.

And that’s not even counting all the many hours I spent in total on this open-source project.

This library has currently more than 260 stars on GitHub, and many forks. It seems to me it is used by many people.

But who pays for it?

I think there should be some process to allow developers of useful open-source code hosted on GitHub (or on other platforms as well) to be fairly compensated for their work.

Spot the Bug

Consider the following C++ code snippet:

// Assume proper std:: namespace imports and being inside main

string name = "Connie";
for (auto ch : name) {
    ch = toupper(ch);
}

cout << name << '\n';

What will the output be?

“Connie”? “CONNIE”? Other?

The intention of the programmer was clearly to convert the name string to upper case. But, does this code work as expected? If it doesn’t, what’s the cause of the bug, and how can you fix it?

If you don’t know the answers to the above questions, you may be interesting in my Pluralsight course on learning modern C++ from scratch.

Note that Pluralsight is currently running a 33% off limited-time offer, so you can save now clicking on the banner below:

Pluralsight Limited-Time Offer 33% Off

Just a heads up to let you know that Pluralsight is offering a 33% off discount on individual annual Standard or Premium subscriptions for a limited time (3/14-3/18).

Click the banner and save now!

Simplifying Windows Registry Programming with the C++ WinReg Library

WinReg is a modern C++ library I wrote that exposes a high-level public interface to simplify the access to the Windows Registry.

For instance, instead of invoking low-level C-interface APIs like RegOpenKeyEx or RegQueryValueEx, you can simply create an instance of the winreg::RegKey class:

RegKey key(HKEY_CURRENT_USER, L"SOFTWARE\\SomeKey");

and invoke convenient RegKey’s methods like GetStringValue:

wstring s = key.GetStringValue(L"Connie");

Note that this library wraps low-level native Windows C API types into higher-level C++ classes like std::wstring for Registry string values, or std::vector<std::wstring> for multi-string values.

The C++ WinReg library is open source and freely available on GitHub.

Enjoy!

 

 

C++ String Benchmark: STL vs. ATL vs. Custom Pool Allocator

I was curious to compare the performance of the STL string implementation versus ATL CString, using Visual Studio 2019, so I wrote some simple C++ benchmark code for this purpose.

I also added into the mix a custom string pool allocator based on some code from The Old New Thing blog, that I modified and bug-fixed. This allocator basically maintains a singly-linked list of chunks, and string memory is carved from each chunk just increasing a string pointer. When there isn’t enough memory in the current chunk to serve the allocation, a new chunk is allocated. The new chunk is safely linked to the previous chunk list, and the memory for the requested strings is carved from this new chunk. The linked list of chunks is traversed at destruction time to properly release the allocated memory blocks.

I measured the times to create and fill string vectors, and the times to sort the same vectors.

TL;DR: The STL string performance is great! You can improve creation times with a custom pool allocator.

These times are measured for vectors storing each kind of strings: STL’s wstring, ATL’s CString (i.e. CStringW in Unicode builds), and the strings created using the custom string pool allocator.

This is a sample run (executed on a Windows 10 64-bit Intel i7 PC):

String benchmark: STL vs. ATL vs. custom string pool allocator
String benchmark: STL vs. ATL vs. custom string pool allocator

As you can note, the best creation times are obtained with the custom string pool allocator. This was expected, as the allocation strategy of carving string memory from pre-allocated blocks is very efficient.

On the other hand, regarding the sorting times, STL and the custom pool strings perform very similarly.

ATL’s CString, which is based on CoW (Copy on Write), shows the worst times for both creation and sorting.

Benchmark Variation: Tiny Strings

It’s also possible to run the benchmark with short strings, triggering the SSO (compile the code #define’ing TEST_TINY_STRINGS).

String benchmark with tiny strings: STL vs. ATL vs. custom string pool allocator
String benchmark with tiny strings: STL vs. ATL vs. custom string pool allocator

As you can see in this case, thanks to the SSO, STL strings win by an important margin in both creation and sorting times.

Note: Making the Sorting Comparison Uniform

I’d also like to clarify that, to make the sorting comparison more uniform, I used custom comparator functions with std::sort, invoking wcscmp for each kind of strings. In fact, spelunking in the VS 2019 STL and ATL implementation code (Thank You, Step Into Specific menu), I found that std::wstring invokes wmemcmp (file <xstring>, line #235), while ATL::CString uses wcscmp (file <cstringt.h>, line #564). Probably std::wstring uses wmemcmp because wstring instances can contain embedded NULs.

 

Pluralsight FREE Week

Just a heads-up to let you know that Pluralsight’s platform is FREE for the entire week!

You can watch 7,000+ expert-led video courses for free for one week only.

For example: Are you interested in an Introduction to Algorithms and Data Structures in C++? You can watch my course for free during this week!

Introducing the stack with an interesting metaphor
Big-O doesn’t have to be boring!

Or do you want to get started with the C programming language? No problem! I’ve got you covered.

Analyzing a subtle bug.
Analyzing a subtle bug.
Discussing the memory layout of strings in C.
Discussing the memory layout of strings in C.

These are just two examples.

But you can enjoy watching whatever you want from the entire Pluralsight library for free during this week!

Click the banner below and enjoy learning!