Fixing the string_view-and-the-Magic-String Bug

In a previous blog post we saw an interesting and subtle bug involving std::string_view.

So, how can you fix that bug?

Well, an option could be creating a std::string instance from the string_view, and then invoke the string::c_str() method to pass a properly null-terminated string to the legacy C API:

void DoSomething(std::string_view name)
{
    // BUG:
    //   SomeCApi(name.data());
    //
    // FIX:
    SomeCApi(std::string{ name.data(), name.length() }
             .c_str());
}

In fact, string::c_str() guarantees that the returned string is null-terminated.

So, you may think to implement a simple inline helper function, to abstract away the previous ugly code:

inline const char* StringViewToCApi(std::string_view sv)
{
    return std::string{ sv.data(), sv.length() }.c_str();
}

But, if you try it out, you get the following output:

Weird characters are printed out instead of “Connie”.
Weird characters are printed out instead of “Connie”.

So, it looks like this time the cure is worse than the disease!

You got those weird characters instead of the expected “Connie”!

What’s going on here?

Well, if you take a look at your code in the Visual Studio IDE, you’ll note that the offending line is properly squiggled; and if you hover over that line with the mouse cursor, you get an interesting and clear explanation:

“The pointer is dangling because it points at a temporary instance which was destroyed.”

The Visual Studio 2019 IDE clearly diagnosed the problem in that code.
The Visual Studio 2019 IDE clearly diagnosed the problem in that code.

Basically, you created a temporary std::string instance inside the helper function, and then you invoked c_str() on it. The string::c_str() method returns a pointer to the temporary string object, which gets destroyed at the end of the helper function. As a result of that, the pointer returned back to the caller is dangling, as it points to some memory that has been already freed!

In fact, those box drawing characters showed in the output correspond to a 0xCC byte sequence, which is used by the Microsoft Visual C++ compiler to mark this kind of “invalid” memory.

So, how can you fix this bug?

Well, unfortunately, you just cannot safely return a const char* pointer that was handed to you by string::c_str(), if that string object was destroyed when the function exited.

However, what you can do is to simplify the above code, using a proper std::string constructor, that simply takes a string_view as input, and creates a std::string instance from the input string_view:

void DoSomething(std::string_view name)
{
    // BUG: SomeCApi(name.data());
    //
    // FIX:
    //
    SomeCApi(std::string{ name }.c_str());
}

And, finally, you get the expected output!

The expected output is printed out using the fixed code.
The expected output is printed out using the fixed code.

Repro Code:

// FIX: The Case of string_view and the Magic String -- by Giovanni Dicanio

#include <stdio.h>

#include <iostream>
#include <string>
#include <string_view>

void SomeCApi(const char* name)
{
    printf("Hello, %s!\n", name);
}

void DoSomething(std::string_view name)
{
    // BUG: SomeCApi(name.data());
    //
    // FIX:
    //
    SomeCApi(std::string{ name }.c_str());
}

int main()
{
    std::string msg = "Connie is learning C++";
    auto untilFirstSpace = msg.find(' ');

    std::string_view v{ msg.data(), untilFirstSpace };

    std::cout << "String view: " << v << '\n';

    DoSomething(v);
}

 

Level Up with Pluralsight – Learn FREE for All April

Just a heads up to let you know that Pluralsight is making all video courses FREE for the month of April!

Are you interested in getting started with the C programming language?

What about getting to know some practical features of C++14 and C++17?

Do you need a practical introduction on how to use the C++ Standard Library’s containers (like std::vector, std::list, std::map, std::unordered_map, etc.)?

Would you like an introduction to data structures and algorithms in C++?

Or would you like to learn modern C++ from scratch?

All these courses of mine and 7,000+ other courses are available for FREE for all April!

Level up your tech skills for free all month long. Click the banner below to access Pluralsight #FreeApril!

The Case of string_view and the Magic String


Someone was working on modernizing some legacy C++ code. The code base contained a function like this:

void DoSomething(const char* name)

The DoSomething function takes a read-only string expressed using a C-style string pointer: remember, this is a legacy C++ code base.

For the sake of this discussion, suppose that DoSomething contains some simple C++ code that invokes a C-interface API, like this (of course, the code would be more complex in a real code base):

void DoSomething(const char* name) 
{
    SomeCApi(name);
}

SomeCApi also expects a “const char*” that represents a read-only string parameter.

However, note that the SomeCApi cannot be modified (think of it like a system C-interface API, for example: a Windows C API like MessageBox).

For the sake of this discussion, suppose that SomeCApi just prints out its string parameter, like this:

void SomeCApi(const char* name) 
{
    printf(“Hello, %s!\n”, name);
}

In the spirit of modernizing the legacy C++ code base, the maintainer decides to change the prototype of DoSomething, stepping up from “const char*” to std::string_view:

// Was: void DoSomething(const char* name)
void DoSomething(std::string_view name)

The SomeCApi still expects a const char*. Remember that you cannot change the SomeCApi interface.

So, the maintainer needs to update the body of DoSomething accordingly, invoking string_view::data to access the underlying character array:

void DoSomething(std::string_view name) 
{
    // Was: SomeCApi(name);
    SomeCApi(name.data());
}

In fact, std::string_view::data returns a pointer to the underlying character array.

The code compiles fine. And the maintainer is very happy about this string_view modernization!

 

Then, the code is executed for testing, with a string_view name containing “Connie”. The expected output would be:

“Hello, Connie!”

But, instead, the following string is printed out:

“Hello, Connie is learning C++!”

Wow! Where does the “ is learning C++” part come from??

Is there some magic string hidden inside string_view?

As a sanity check, the maintainer simply prints out the string_view variable:

// name is a std::string_view
std::cout << “Name: “ << name;
 

And the output is as expected: “Name: Connie”.

So, it seems that cout does print the correct string_view name.

But, somehow, when the string_view is passed deep down to a legacy C API, some string “magic” happens, showing some additional characters after “Connie”.

What’s going on here??

Figuring Out the Bug

Well, the key here are two words: Null Terminator.

In fact, the C API that takes a const char* expected the string to be null-terminated.

On the other hand, std::string_view does not guarantee null-termination!

So, consider a string_view that “views” only a portion of a string, like this:

std::string str = “Connie is learning C++”;
auto untilFirstSpace = str.find(‘ ‘);
std::string_view name{str.data(), untilFirstSpace}; // “Connie”

The string_view certainly “views” the “Connie” part. But, if you consider the memory layout, after these “Connie” characters in memory there is no null terminator, which was expected by the C API. So, the C API views the whole initial string, until it finds the null terminator.

So, the whole string is printed out by the C API, not just the part observed by the string_view.

Memory layout: string_view vs. C-style null-terminated strings
Memory layout: string_view vs. C-style null-terminated strings

This is a very subtle bug, that can be hard to spot in more complex code bases.

So, remember: std::string_views are not guaranteed to be null terminated! Take that into consideration when calling C-interface APIs, that expect C-style null-terminated strings.

P.S. As a side note, std::string::c_str guarantees that the returned pointer points to a null-terminated character array.

> Follow up here.

Repro Code

// The Case of string_view and the Magic String 
// -- by Giovanni Dicanio

#include <stdio.h>

#include <iostream>
#include <string>
#include <string_view>

void SomeCApi(const char* name)
{
    printf("Hello, %s!\n", name);
}

void DoSomething(std::string_view name)
{
    SomeCApi(name.data());
}

int main()
{
    std::string msg = "Connie is learning C++";
    auto untilFirstSpace = msg.find(' ');

    std::string_view v{ msg.data(), untilFirstSpace };

    std::cout << "String view: " << v << '\n';

    DoSomething(v);
}

Pluralsight Black Friday 40% Off

Just wanted to give you all a heads up about current Pluralsight Black Friday 40% Off promotion on all Personal Annual and Premium subscriptions.

Click on the banner below to save now.

New Pluralsight Course on C++ Standard Library Associative Containers

Happy Pi Day!

My new Pluralsight course on C++ Standard Library Associative Containers is live!

In this course, you’ll learn with a combination of slides and demo code, how to use associative containers like std::set, std::map, and std::unordered_map.

This is a follow up course of my previous course, which I encourage you to watch before this one.

Starting from this course page, you can freely play the course overview, and read a more detailed course description and the table of content.

Comparing the performance of std::unordered_map vs. std::map.
Comparing the performance of std::unordered_map vs. std::map.
Analyzing a subtle bug when working with std::map.
Analyzing a subtle bug when working with std::map.

These are some feedback notes from my reviewers:

Nice use of the PowerPoint slides and Camtasia callouts to keep the learners focused and engaged. [Peer Review]

Enjoyable clip of std::map string to string dictionary translation. [Peer Review]

Overall, a strong module that’s well-explained, approachable and professionally polished. Looking forward to more! [Peer Review]

The content is logically sequenced, building on concepts as we go. The clips are nice and short, which makes it easy to move through and absorb the content. You also do a great job transitioning across the clips; there’s a cohesive flow to the module. [Peer Review]

I’m glad you showed this error, explained why it’s happening, and how to fix it. It’s a good opportunity to reiterate learnings from earlier in the module, and also seems like a common gotcha. [Peer Review]

Happy learning!

 

New Pluralsight Course on C++ Standard Library Containers

My new Pluralsight course on C++ Standard Library Containers is live!

In this course, you’ll learn how to use some important containers implemented in the C++ Standard Library, with a combination of theoretical introduction using slides, and practical C++ implementation code, including analyzing and fixing some common bugs.

Comparing the memory layout of std::list vs. std::vector.
Comparing the memory layout of std::list vs. std::vector.

C++ Standard Library implementations offer high-quality well-tested and highly-optimized standard containers, that are very powerful tools when developing software written in C++.

In particular, I’ll discuss std::vector (which is a Standard Library workhorse), std::array, and std::list, including how to use them, discussing their pros and cons, and giving some guidance on picking one or the other, based on the problem at hand. Other containers (e.g. std::map) will be the topic of follow-up courses.

No prior knowledge of C++ Standard Library containers is required. You only need a basic knowledge of C++ language features.

Working on the implementation of a case-insensitive string search.
Working on the implementation of a case-insensitive string search.

Containers and algorithms are kind of like “bread and butter”, so in this course you’ll also learn about the C++ Standard Library design based on the teamwork between containers, iterators and algorithms, and you’ll see how to perform important operations on containers leveraging some useful algorithms already implemented in the C++ Standard Library.

Explaining the erase-remove idiom.
Explaining the erase-remove idiom.

Note that this course is both theory and practice! In fact, I’ll show practical demo code, and I’ll also discuss some bugs that are especially common for those who are just starting to learn the C++ Standard Library’s containers.

Analyzing a subtle bug when working with std::list.
Analyzing a subtle bug when working with std::list.

These are some feedback notes from my reviewers:

The narration is clear, animated, and engaging. The visuals are particularly helpful. [Peer Review]

You do a particularly good job clearly stating the problem here (and elsewhere) so that the solution, when it comes, makes sense and fits nicely. [Peer Review]

Great simple example of undefined behavior to reinforce the concepts you’ve introduced as well as a bonus of uncovering a security issue. [Peer Review]

Very nice module with good examples. Also excellent visuals when describing list, vectors and the various operations. [Peer Review]

Very nice discussion of the trade-offs between a linked list and a vector [Peer Review]

Nice use of a bug to teach a key concept [Peer Review]

 

Starting from this course page, you can freely play the course overview, and read a more detailed course description and the table of content.

Let me also express my gratitude to all the Pluralsight persons involved in the production of this course: It’s always a pleasure to work with you all!

I hope you’ll enjoy watching this course!