Is There “Connie” in the Map?

Today is Pi (π) Day, and I wanted to celebrate with this blog post (even if it’s C++-related and not math-related). Hope you’ll find it useful, and maybe even enjoy it.

 

std::map is one of those useful containers implemented in the C++ Standard Library. It basically stores key-value pairs, with unique keys. In other words, std::map maps unique keys to values.

As a concrete example, you can think of an address book: The “key” is the person’s name, and the “value” associated with it can be a custom data structure storing information like address, e-mail address, telephone number, and so on.

This can be translated in C++ code like that:

std::map<std::string, Person> addressBook;

Another interesting application could be counting words in a document. In this case, you still have a string key (i.e. the word you’re counting), and the associated value in the key-value pair would be an integer representing the occurrences of that word:

map<string, int> wordCount;

E.g.: If you have 500 “Connie”s in a document, “Connie” is mapped to 500.

You can use a map from string to int for word counting applications.

A common operation you can do on a map is checking if there is an element with a given key in the container (i.e. a given name in the previous address book example, or a given word in the word counting one).

So, assuming that you have some std::map object with std::string keys, how to check if “Connie” is a key contained in the map?

Method #1: Invoking map::find()

Well, std::map offers a find() method. You pass as input the key of the element to search for, and find() returns an iterator to the element with that given key. If such element doesn’t exist in the map container, then std::find() returns the map’s end iterator.

So, if you want to check if “Connie” is a key in the map, you can call std::map::find(), and compare the returned iterator to the map’s end iterator:

if (myMap.find("Connie") != myMap.end())
{
    // "Connie" is in the map
    cout << "Found! \n";
}

Method #2: Invoking map::count()

There’s also another option. std::map offers the count() method, which returns the number of elements with the given key, which is either 1 or 0 for std::map (as std::map contains unique keys).

So, translated in C++ you get something like that:

if (myMap.count("Connie") == 1)
{
    // "Connie" is in the map
    cout << "Found! \n";
}

 

Two options for checking if an element with a given key is contained in a std::map, invoking the find or count methods

Method #3: Back to the Future, and Thank You C++20

Honestly, I found the previous two methods kind of obscure and unclear (especially before someone has already introduced them to you). I mean: If I want to check if a std::map contains an element with a given key, why can’t I just ask the container in a simple straightforward way? Something like that:

if (myMap.contains("Connie"))
{
    // "Connie" is in the map
    cout << "Found! \n";
}

I even found a newsgroup post dating back to 2009, in which I expressed my wish for a simple clear map::contains() method:

Fortunately, it seems that C++20 will finally add it! 😊

As a general rule of class interface design, I believe it’s better to add a direct clear straightforward method to the public interface, than forcing your class users to “fight” the interface and circumnavigate it to get the result.

 

OK, since today is π Day, I think there’s room for some math 😊 So, all these methods (find, count and contains) have logarithmic asymptotic complexity (O(log(N)) in the size of the container. If you are curious about the Big-O notation and asymptotic complexity, you may want to check out my course on Introduction to Data Structures and Algorithms in C++.

Big-O doesn’t have to be boring!

 

New Pluralsight Course: Introduction to Data Structures and Algorithms in C++

A new course of mine was published in the Pluralsight library: Introduction to Data Structures and Algorithms in C++.

In this course, you’ll learn how to implement some fundamental data structures and algorithms in C++ from scratch, with a combination of theoretical introduction using slides, and practical C++ implementation code.

Introducing the stack with an interesting metaphor

No prior data structure or algorithm theory knowledge is required. You only need a basic knowledge of C++ language features (please watch the “Prerequisites” clip in the first module for more details about that).

Explaining linear search using slides

During this course journey, you’ll also learn some practical C++ coding techniques (ranging from move semantic optimization, to proper safe array copying techniques, insertion operator overloading, etc.) that you’ll be able to use in your own C++ projects, as well.

So, this course is both theory and practice!

Spotting a subtle bug

Here’s just a couple of feedback notes from my reviewers:

The callouts are helpful and keep the demo engaging as you explain the code. [Peer Review]

To say that this is an excellent explanation of Big-O notation would be an understatement. The way you illustrate and explain it is far better than the way it was taught to me in college! [Peer Review]

Big-O doesn’t have to be boring!

Starting from this course page, you can freely play the course overview, and read a more detailed course description and table of content.

I hope you’ll enjoy watching this course!

 

C++ String Guidance

Last time, I enumerated a few types of strings available in C++.

These days, I’d suggest as the default option for cross-platform standard C++ code to use std::string, storing UTF-8-encoded text inside it. Note that pure ASCII is a proper subset of UTF-8, so storing pure ASCII text in std::string objects is just fine.

In addition, for those platform-specific sections of C++ code, I’d suggest using whatever string class and encoding are typical and “natural” for that platform. For example, at the Windows API boundary, use the UTF-16 encoding, and the std::wstring class in C++ code that doesn’t use ATL or MFC.

In addition, in C++ Windows-specific code that already uses ATL or MFC, another option is to use CString (or the explicit CStringW) enabling Visual Studio Unicode builds (“Configuration Properties” | “General” | “Character Set”: “Use Unicode Character Set”, which has been the default since probably Visual Studio 2005).

On the other hand, Qt-based C++ code can use the QString class, and so on.

 

How Many Strings Does C++ Have?

(…OK, a language lawyer would nitpick suggesting “How many string types…”, but I wanted a catchier title.)

So, if you program in Python and you see something enclosed by either single quotes or double quotes, you have a string:

s = 'Connie'

Something similar happens in Java, with string literals like “Connie” implemented as instances of the java.lang.String class:

String s = "Connie";

All right.

Now, let’s enter – drumroll, please – The Realm of C++! And the fun begins 😊

So, let’s consider this simple line of C++ code:

auto s1 = "Connie";

What is the type of s1?

std::string? A char[7] array? (Hey, “Connie” is six characters, but don’t forget the terminating NUL!)

…Something else?

So, you can use your favorite IDE, and hover over the variable name, and get the deduced type. Visual Studio C++ IntelliSense suggests it’s “const char*”. Wow!

Visual Studio IntelliSense deduces const char pointer.

And what about “Connie”s?

auto s2 = "Connie"s;

No, it’s not the plural of “Connie”. And it’s not a malformed Saxon genitive either. This time s2 is of type std::string! Thank you operator””s introduced in C++14!

Visual Studio IntelliSense deduces std::string

But, are we done? Of course, not! Don’t forget: It’s C++! 😊

For example, you can have u8”Connie”, which represents a UTF-8 literal. And, of course, we need a thread on StackOverflow to figure out “How are u8-literals supposed to work?”

And don’t forget L”Connie”, u”Connie” and U”Connie”, which represent const wchar_t*, const char16_t* (UTF-16 encoded) and const char32_t* (UTF-32 encoded) respectively.

Now we are done, right? Not yet!

In fact, you can combine the previous prefixes with the standard s-suffix, for example: L”Connie”s is a std::wstring! U”Connie”s is a std::u32string. And so on.

Done, right? Not yet!! In fact, there are raw string literals to consider, too. For example: R”(C:\Path\To\Connie)”, which is a const char* to “C:\Path\To\Connie” (well, this saves you escaping \ with \\).

And don’t forget the combinations of raw string literals with the above prefixes and optionally the standard s-suffix, as well: LR”(C:\Path\To\Connie)”, UR”(C:\Path\To\Connie)”, LR”(C:\Path\To\Connie)”s, UR”(C:\Path\To\Connie)”s, and more!

Oh, and in addition to the standard std::string class, and other standard std::basic_string-based typedefs (e.g. std::wstring, std::u16string, std::u32string, etc.), there are platform/library specific string classes, like ATL/MFC’s CString, CStringA and CStringW. And Qt brings QString to the table. And wxWidgets does the same with its wxString.

Wow! And I would not be surprised if I missed some other string variation out 😊

P.S. With all this string variety (maybe too much…), what about adding to the C++ Standard Library some convenient functions for at least common string operations like trimming spaces and converting strings to upper case and lower case? All in all, C++ does already have rocket-science stuff like Bessel functions in its Standard Library. While, back in the old MFC days, CString already offered convenient methods like Trim, MakeLower and MakeUpper, just to name a few.


Sample slide: Introducing the std::string class
Sample slide: Introducing the std::string class

If you want to learn modern C++ from scratch, in a fun and interesting way, with engaging slides and demo code, please check out my course!

Is Your “C++11 from Scratch” Course Still Valid Today? Yes, Absolutely!

I’m very proud of my “C++11 from Scratch” course published by Pluralsight.

We are in 2018, and there have been C++14 and C++17 in the meantime. So, a legit question is: “Does it make sense for me to watch your C++11 course today for a beginner-oriented introduction to C++?” And the answer is a BIG STRONG YES! 😊

In fact, in that course you will learn modern C++ topics that are valid in both C++11, and also in next versions of the language. For example, what you will learn about the parameter passing rules, like passing by reference vs. passing by value, is perfectly valid in C++11, C++14, and C++17 as well.

Moreover, the practical introductions I gave to standard library’s classes like std::string, or std::vector, and to the std::sort algorithm, just to name a few, are totally valid also in C++14 and C++17.

Similarly, my discussions on defining custom types, constructors, destructor, the RAII pattern and the scope-based lifetime of objects are still valid in C++14 and C++17, as well.

Maybe a better title for that course would be “Modern C++ from Scratch”. Anyway, the content is already there, available for an enjoyable learning experience, with a mix of slides containing interesting visuals, and demo code.

Sample slide: Introducing the std::string class
Sample slide: Introducing the std::string class

And, if you are already familiar with C++11, you may enjoy my follow-up course on “Practical C++14 and C++17 Features”.

Happy learning!

 

Limited-time Discount on Pluralsight Annual Subscriptions

I’d like to give you a heads-up that for a limited time, Pluralsight will be discounting Individual Annual Subscriptions 33% (or $100), making them only $199.

33% Off Pluralsight Annual Subscriptions – Save $100 for a Limited Time!

I encourage you to take this opportunity to save $100 on your annual subscription!

Note that if you are an existing subscriber you can take advantage of this offer as well: In fact, your current subscription will be extended for a year for $199.

New Pluralsight Course: Practical C++14 and C++17 Features

A new course of mine was published in the Pluralsight library: Practical C++14 and C++17 Features.

From the course short description:

C++14 and C++17 added many new features to the C++ language. This course will teach you practical features introduced in C++14 and C++17, that you will be able to use to write clearer, simpler, and higher-quality modern C++ code.

You can take this course to learn about practical features added in C++14 and C++17, ranging from syntactic sugar like digit separators, to more substantial features like polymorphic lambdas (this course will offer an introduction to basic lambdas as well), relaxed constexpr functions, the Chrono library with its standard-defined duration suffixes, and C++17 juice ranging from nested namespaces, variable declarations in if statements, to “constexpr if” and structured bindings, just to name a few.

Building an Italian-to-English dictionary with std::map

I discussed these topics with both slides and demo code, including showing some bugs and how to fix them.

Demo: Sorting by string length using lambdas

You can watch the course trailer and read a more detailed course description and the table of content starting from this course page.

Proper unit conversions are important!

I put the discussed features in proper context for learners who are already familiar with basic elements of C++11. For example, when I introduced C++14 std::make_unique, I also talked about smart pointers and introduced std::unique_ptr as well. If you need an introduction to basic elements of modern C++, you can take my “C++11 from Scratch” course.

Here’s some feedback from my reviewers:

You’ve done an excellent job with the animated shapes/callouts throughout the module. They really help me to follow along with the narrative explanations. [Peer Review]

The content is logically organized and chunked into bite-size clips. I also like your mix of slides and demos. [Peer Review]

This is an excellent challenge to the viewer to spot the bug in the code. [Peer Review]

Overall, a strong module that will be well-received by an intermediate audience. The explanations are clear and the concepts build on each other, making it easy to follow along. Keep up the great work! [Peer Review]

Raw owning pointers are radioactive!

Thank You

Writing and producing this course has been an interesting journey and a rewarding experience for me. There are several people who worked with me during this journey and with their contributions helped me producing this quality course. I’d like to thank my ASM (former Editor) Beth Gerard-Hess, my Production Editor Austin Crawford, my Curriculum Director Tod Gentille, my reviewers (both QA and peer), and all the Pluralsight persons who worked on this course project. Thanks also to Stephan T. Lavavej for interesting e-mail conversations that provided good food for thought.

I hope you will enjoy this new course on Practical C++14 and C++17 Features: Happy learning! 😊

 

Subtle Bug When Converting Strings to Lowercase

Suppose that you want to convert a std::string object to lowercase.

The first thing you would do is probably searching the std::string documentation for a convenient easy simple method named to_lower, or something like that. Unfortunately, there’s nothing like that.

So, you might start developing your own “to_lower” function. A typical implementation I’ve seen of such custom function goes something like this: For each character in the input string, convert it to lowercase invoking std::tolower. In fact, there’s even this sample code on cppreference.com:

// From http://en.cppreference.com/w/cpp/string/byte/tolower

std::string str_tolower(std::string s) {
    std::transform(s.begin(), s.end(), s.begin(), 
                   [](unsigned char c) { return std::tolower(c); }
                  );
    return s;
}

Well, if you try this code with something like str_tolower(“Connie”), everything seems to work fine, and you get “connie” as expected.

Now, since C++ folks like storing UTF-8-encoded text in std::string objects, in some large code base someone happily takes the aforementioned str_tolower function, and invokes it with their lovely UTF-8 strings. Fun ensured! …Well, actually, bugs ensured.

So, the problem is that str_tolower, under the hood, calls std::tolower on each char in the input string. While this works fine for pure ASCII strings like “Connie”, such code is a bug farm for UTF-8 strings. In fact, UTF-8 is a variable-width character encoding. So, there are some Unicode “characters” (code points) that are encoded in UTF-8 using one byte, while other characters are encoded using two bytes, and so on, up to four bytes. The poor std::tolower has no clue of such UTF-8 encoding features, so it innocently spits out wrong results, char by char.

For example, I tried invoking the above function on “PERCHÉ” (the last character is the Unicode U+00C9 LATIN CAPITAL LETTER E WITH ACUTE, encoded in UTF-8 as the two-byte sequence 0xC3 0x89), and the result I got was “perchÉ” instead of the expected “perché” (é is Unicode U+00E9, LATIN SMALL LETTER E WITH ACUTE). So, the pure ASCII characters in the input string were all correctly converted to lowercase, but the final non-ASCII character wasn’t.

Actually, it’s not the std::tolower function: It’s that this function was misused, invoking it in a way that the function was not designed for.

This is one of the perils of taking std::string-based C++ code that initially worked with ASCII strings, and thoughtlessly reuse it for UTF-8-encoded text.

In fact, we saw a very similar bug in a previous blog post.

So, how can you fix that problem? Well, a portable way is using the ICU library with its icu::UnicodeString class and its toLower method.

On the other hand, if you are writing Windows-specific C++ code, you can use the LcMapStringEx API. Note that this function uses the UTF-16 encoding (as almost all Windows Unicode APIs do). So, if you have UTF-8-encoded text stored in std::string objects, you first have to convert it from UTF-8 to UTF-16, then invoke the aforementioned API, and finally convert the UTF-16-encoded result back to UTF-8. For these UTF-8/UTF-16 conversions, you may find my MSDN Magazine article on “Unicode Encoding Conversions with STL Strings and Win32 APIs” interesting.

 

The CStringW with wcout Bug Under the Hood

I discussed in a previous blog post a subtle bug involving CStringW and wcout, and later I showed how to fix it.

In this blog post, I’d like to discuss in more details what’s happening under the hood, and what triggers that bug.

Well, to understand the dynamics of that bug, you can consider the following simplified case of a function and a function template, implemented like this:

void f(const void*) {
  cout << "f(const void*)\n";
}

template <typename CharT> 
void f(const CharT*) {
  cout << "f(const CharT*)\n";
}

If s is a CStringW object, and you write f(s), which function will be invoked?

Well, you can write a simple compilable code containing these two functions, the required headers, and a simple main implementation like this:

int main() {
  CStringW s = L"Connie";
  f(s);
}

Then compile it, and observe the output. You know, printf-debugging™ is so cool! 🙂

Well, you’ll see that the program outputs “f(const void*)”. This means that the first function (the non-templated one, taking a const void*), is invoked.

So, why did the C++ compiler choose that overload? Why not f(const wchar_t*), synthesized from the second function template?

Well, the answer is in the rules that C++ compilers follow when doing template argument deduction. In particular, when deducing template arguments, the implicit conversions are not considered. So, in this case, the implicit CStringW conversion to const wchar_t* is not considered.

So, when overload resolution happens later, the only candidate available is f(const void*). Now, the implicit CStringW conversion to const wchar_t* is considered, and the first function is invoked.

Out of curiosity, if you comment out the first function, you’ll get a compiler error. MSVC complains with a message like this:

error C2672: ‘f’: no matching overloaded function found

error C2784: ‘void f(const CharT *)’: could not deduce template argument for ‘const CharT *’ from ‘ATL::CStringW’

The message is clear (almost…): “Could not deduce template argument for const CharT* from CStringW”: that’s because implicit conversions like this are not considered when deducing template arguments.

Well, what I’ve described above in a simplified case is basically what happens in the slightly more complex case of wcout.

wcout is an instance of wostream. wostream is declared in <iosfwd> as:

typedef basic_ostream<wchar_t, char_traits<wchar_t>> wostream;

Instead of the initial function f, in this case you have operator<<. In particular, here the candidates are an operator<< overload that is a member function of basic_ostream:

basic_ostream& basic_ostream::operator<<(const void *_Val)

and a template non-member function:

template<class _Elem, class _Traits> 
inline basic_ostream<_Elem, _Traits>& 
operator<<(basic_ostream<_Elem, _Traits>& _Ostr, const _Elem *_Val)

(This code is edited from the <ostream> standard header that comes with MSVC.)

When you write code like “wcout << s” (for a CStringW s), the implicit conversion from CStringW to const wchar_t* is not considered during template argument deduction. Then, overload resolution picks the basic_ostream::operator<<(const void*) member function (corresponding to the first f in the initial simplified case), so the string’s address is printed via this “const void*” overload (instead of the string itself).

On the other hand, when CStringW::GetString is explicitly invoked (as in “wcout << s.GetString()”), the compiler successfully deduces the template arguments for the non-member operator<< (deducing wchar_t for _Elem). And this operator<<(wostream&, const wchar_t*) prints the expected wchar_t string.

I know… There are aspects of C++ templates that are not easy.