A Simple STL vs. ATL String Performance Test – And the Winner Is… STL!

EDIT: New Performance Test available here.


In addition to the previous GotW#45 tests executed for the ATL’s and STL’s strings, I wanted to try my own tests. So I developed some simple C++ code that builds a std::vector of strings, and then sorts that vector. This test is executed for both ATL’s CStringW and STL’s wstring. Both the insertion times and sorting times are recorded.

You can check it out on GitHub.

The code was compiled with VS2015, and executed on an Intel i7 workstation running Windows 10 64-bit.

As you can see from the following screenshots, the STL string class always wins (faster execution times). In particular, when the test is executed for tiny strings, the STL’s small string optimization (SSO) is a clear winner over ATL’s CString.

Testing ATL vs. STL String Performance
Testing ATL vs. STL String Performance
Testing ATL vs. STL String Performance (SSO)
Testing ATL vs. STL String Performance (SSO)

I recall I did some similar measurements with older VC++ compiler versions (probably VS2008) and ATL’s CString was a winner back then. C++11’s move semantics and other improvements in the MSVC compiler and in the STL implementation made an important difference here. Kudos to the Visual C++ Team’s compiler and STL library guys for the improvements!


Is Copy-on-write Really a Pessimization Under Multithreading?

Copy-on-write (COW) is an optimization technique used by several C++ classes, including the ATL’s CString class, and many classes in the Qt framework.

However, it seems that the C++11 standard bans the use of COW in std::string.

Many developers consider COW a “pessimization” under multi-threading, and to support their argument they usually point to a piece written by Herb Sutter (GotW#45).

I’m an intellectually curious person, so I wanted to test that code on modern systems. I downloaded the original GotW#45 code, did some adjustments to make it compile cleanly with Visual Studio 2015 at warning level 4, changed some stuff like using Windows high-performance counters to measure time (instead of GetTickCount), and added a couple of tests for STL’s std::string and ATL’s CStringA.

You can download the modified code here from GitHub.

On a modern Intel i7-based workstation, the results for 100-char-length strings seem to show that CString (which is COW-based) actually performs better than std::string in those tests.

100-char STL string slower than COW ATL CString
100-char-length std::string’s performance worse than COW-based ATL CString.

However, when strings of shorter lengths are tested (e.g. 10 chars), std::string wins.

So, probably COW is not always a pessimization: there are cases in which the size of the data to copy can have a significant impact.

It’s also interesting that the fbstring class, which is a drop-in replacement for std::string, claiming significantly increased performance, uses COW for large strings.

fbstring uses COW for large strings
fbstring uses COW for large strings


P.S. Of course, I’m aware that are several other aspects to consider from a performance perspective, including data locality, etc. Still, I think these results are interesting.

A Laser-Focused Alternative to std::decay Using C++14’s Alias Templates

In the previous blog post, there was some code iterating through a generic container, using a range-for loop like so:

for (const auto& elem : c)

I wanted the C++ compiler to pick a specific traits class template specialization based on the current element’s type. Using just decltype(elem) turned out to be a bug, since the returned type was a const reference to the element’s type, but what I actually wanted was the original element’s type (not a const reference to it).

So, I used std::decay to strip off the “const &” part from the type returned by decltype(elem), leaving only the actual element’s type, such that the compiler could pick the proper traits class specialization based on that type.

However, std::decay performs also additional transformations, like those involving arrays and functions, which aren’t relevant in our case.

A more laser-focused alternative would be using std::remove_reference and std::remove_const (both defined in the <type_traits> standard header).

We could define a simple class template for that purpose:

#include <type_traits> // for remove_const, remove_reference

// Strip off "const &" from "const Type&",
// leaving only "Type".
template <typename T>
class RemoveConstReference
  // Strip off the reference part
  typedef typename 


  // Strip off the const,
  // after having stripped off the reference
  typedef typename 

And then that RemoveConstReference custom-defined template could be applied to strip off “const &” from elem’s type:


And finally the correct traits specialization is picked using code like this:

const int x = 

In addition, C++14 introduced some convenient alias templates. For example: besides C++11’s remove_reference, in C++14 a remove_reference_t alias template was defined:

template <typename T>
using remove_reference_t
  = typename remove_reference<T>::type;

This is basically syntactic sugar, a syntactic shortcut.

Similarly, there’s a C++14 remove_const_t alias template matching C++11’s remove_const.

Using those _t-ending alias templates, it’s possible to further simplify the code to strip off the “const&”, composing remove_const_t and remove_reference_t like this (“std::” prefix omitted for the sake of clarity):

typedef remove_const_t<

We could even define a custom reusable alias template just for removing “const &”:

template <typename T>
using RemoveConstReference_t = 

And then use it like so:

typedef RemoveConstReference_t<decltype(elem)> 

const int x = 

These C++14’s alias templates, which are just “syntactic sugar”, are nonetheless convenient to make C++ code simpler and more clear!


Mixing Traits Classes, Range-For Iteration and decltype: A Subtle Bug

Suppose you have defined a traits class to query a property for some type. This is a common technique used in C++: for example, to get the largest value of the type int, you can call std::numeric_limits<int>::max. std::numeric_limits is a class template that is specialized for arithmetic types like int, double, etc.

In the int specialization (numeric_limits<int>), its max method returns the largest value of the type int; in the double specialization (numeric_limits<double>), its max method returns the largest value of the type double, and so on. In generic code, when you have some arithmetic type T, you can query T’s largest value calling numeric_limits<T>::max.

So, you defined some traits class, which is a class template, that exposes some public static methods to query some properties for some types. For example:


template <typename Type>
struct Traits
  static int GetSomeProperty(const Type&)
    cout << "Generic"; 
    return 0;

(Of course, in production code, traits classes can contain more than a single method, just like the numeric_limits traits class exposes several member functions.)

Then you specialize the above traits class for the std::string type:

template <>
struct Traits<string>
  static int GetSomeProperty(const string&)
    cout << "String"; 
    return 1; 

This property might be the length in bytes of the input parameter, or it can be some other data associated to that parameter; the description of the particular property is unimportant here.

Then you have a function template that iterates through some generic container using a range-for loop and inside that loop, you invoke the Traits::GetSomeProperty method to query the given property of the current element:

template <typename Container>
void DoSomething(const Container & c)
  for (const auto& elem : c)
    cout << elem << ": ";

    int x = Traits<decltype(elem)>::

    cout << '\n';

Since this function operates on a generic container, you don’t know a priori the type of the elements in the container. So you think of using decltype(elem) to get the type of the current element.

Now, if you invoke that function template on a vector<string>, like this:

int main()
  vector<string> test{ "Hello", "Connie" };

What you would expect as output is probably:

Hello: String
Connie: String

Right? That’s because the DoSomething function template is iterating through a vector of strings, so decltype(elem) seems to be std::string, and so Traits<decltype(elem)> would pick Traits<string>, and consequently the Traits<string>::GetSomeProperty specialization would be invoked, which should print “String”.

Well, if you run this code, what you actually get is:

Hello: Generic
Connie: Generic

Why is that? Why is “Generic” printed instead of the expected “String”?

If pay attention to the range-for loop iteration:

    for (const auto& elem : c)

you’ll notice that elem’s type is actually not std::string. In fact, elem is a const reference to a std::string. That’s because of the “const auto&” iteration format, which in this particular case of vector<string> becomes “const std::string&”.

So, decltype(elem) is not std::string: it’s actually a “const std::string&”, that is: a const reference to a std::string. Since there is no Traits<T> specialization for a “const reference to a std::string” (in fact, you only have a specialization for a std::string), the generic Traits<T> class template code is picked by the compiler, so the generic Traits<T>::GetSomeProperty method is invoked, which prints “Generic”.

How can you fix this code? How to remove the “const &”, leaving only the “std::string” part?

Well, an option is to use std::decay:


This is std::string, without the const reference part.

A more verbose alternative to strip that off is remove_reference and remove_const (thanks Mr.STL for the tip on that one).

So, inside the range-for loop, this line will do The Right Thing (TM):

int x = 

If the above line seems too hard to parse, it’s possible to break it into pieces for better readability, with the help of a convenient intermediate typedef:

typedef std::decay<decltype(elem)>::type

int x = 

I hope this blog post will spare you some headache and debugging time if you have a range-for loop iterating with “const auto&”, and you are stuck getting the wrong (generic) traits invoked instead of the expected specialization.

Some compilable code is available here on GitHub for download and experimentation.

Pluralsight Blog Post: Simplifying Lexicographical Comparisons with C++11

A C++ article of mine has been published on the Pluralsight blog. It shows how to pragmatically use C++11’s std::tie to easily implement lexicographical comparisons of custom data types (the concept of lexicographical comparison is introduced in that post as well).

Basically, instead of writing a long (and potentially bug-prone) sequence of if statements, std::tie can be invoked to build tuples, which in turn can be compared using std::tuple’s already-defined operator< overload.

I also showed a common error that can happen when calling std::tie, and how to fix it using std::make_tuple.

An important take-away is that if there are tools already tested and available in the C++ standard library, it’s better to use them than attempting to reinvent the wheel writing boilerplate bug-prone “low-level” C++ code.

Check out the blog post here!

Thanks to Stephan T. Lavavej (Senior Library Developer at Microsoft Visual C++ Team) for technically reviewing this article.

My First MSDN Magazine Article: Using STL Strings at Win32 API Boundaries

My first article for MSDN Magazine is online! I’m excited about that. Crafting that article has been an interesting, fun and rewarding experience.

Using STL Strings at Win32 API Boundaries

I still recall when I was 20 years young, and used to visit a local newsstand looking forward to buy MSJ (Microsoft Systems Journal), later merged into MSDN Magazine, and delving into it.

I’d like to express my sincere gratitude to Stephan T. Lavavej for his thorough review, to David Cravey for his feedback, to my editor Sharon Terdeman for handling my article in a great way and for her excellent communication, and to Eric Battalio, Gordon Hogenson and MSDN Magazine editor-in-chief Michael Desmond for the initial contacts and for starting the process.

I hope you enjoy reading the article.

STL Introductory Series on Channel 9

Stephan concluded his introductory series on the STL with an interesting chapter on template metaprogramming and type traits.

In addition to previous lessons, here is a complete list of this ten part series introducing the STL:

Part 1 is about sequence containers (like std::vector).

Part 2 is on associative containers (like std::map).

Part 3 discusses smart pointers (e.g. shared_ptr).

Parts 4 and 5 show a practical use of the aforementioned concepts applied to the development of a Nurikabe puzzle solver.

Part 6 and part 7 discuss STL algorithms.

Part 8 is about regular expressions.

In part 9 new C++0x core language features like r-value references and move semantics are discussed.

And finally part 10 is about template metaprogramming and type traits.

Thank you Stephan and Channel 9 for this quality introduction to the STL!


Checked Iterators

Microsoft Visual Studio versions since VC8 (VS2005) offer a feature called “checked iterators”. The MSDN documentation clearly states: “Checked iterators ensure that you do not overwrite the bounds of your container. Checked iterators apply to release builds and debug builds.Checked iterators can be disabled #defining the _SECURE_SCL symbol to 0.

In VC8 (VS2005) and VC9 (VS2008), checked iterators are enabled by default in both debug and release builds. While I agree that having checked iterators enabled by default in debug builds is a good thing (because debug builds are designed to catch as much bugs as possible), I think the default behavior in release builds should be to switch checked iterators off. In fact, I do like speed in release builds.

As a simple benchmark (attached to this blog post), on an Intel Core 2 Duo @ 2.33 GHz, switching checked iterators off improved a simple 1,000×1,000 matrix multiplication time from 20 seconds to about 12 seconds (a 40% improvement).

I like putting the following lines in precompiled header file, to switch checked iterators off in release builds:

// Disable checked iterators in release builds

#ifndef _DEBUG

#define _SECURE_SCL 0



Fortunately, in VS2010, in release mode, the default value for _SECURE_SCL is 0.