How Many Strings Does C++ Have?

(…OK, a language lawyer would nitpick suggesting “How many string types…”, but I wanted a catchier title.)

So, if you program in Python and you see something enclosed by either single quotes or double quotes, you have a string:

s = 'Connie'

Something similar happens in Java, with string literals like “Connie” implemented as instances of the java.lang.String class:

String s = "Connie";

All right.

Now, let’s enter – drumroll, please – The Realm of C++! And the fun begins 😊

So, let’s consider this simple line of C++ code:

auto s1 = "Connie";

What is the type of s1?

std::string? A char[7] array? (Hey, “Connie” is six characters, but don’t forget the terminating NUL!)

…Something else?

So, you can use your favorite IDE, and hover over the variable name, and get the deduced type. Visual Studio C++ IntelliSense suggests it’s “const char*”. Wow!

Visual Studio IntelliSense deduces const char pointer.

And what about “Connie”s?

auto s2 = "Connie"s;

No, it’s not the plural of “Connie”. And it’s not a malformed Saxon genitive either. This time s2 is of type std::string! Thank you operator””s introduced in C++14!

Visual Studio IntelliSense deduces std::string

But, are we done? Of course, not! Don’t forget: It’s C++! 😊

For example, you can have u8”Connie”, which represents a UTF-8 literal. And, of course, we need a thread on StackOverflow to figure out “How are u8-literals supposed to work?”

And don’t forget L”Connie”, u”Connie” and U”Connie”, which represent const wchar_t*, const char16_t* (UTF-16 encoded) and const char32_t* (UTF-32 encoded) respectively.

Now we are done, right? Not yet!

In fact, you can combine the previous prefixes with the standard s-suffix, for example: L”Connie”s is a std::wstring! U”Connie”s is a std::u32string. And so on.

Done, right? Not yet!! In fact, there are raw string literals to consider, too. For example: R”(C:\Path\To\Connie)”, which is a const char* to “C:\Path\To\Connie” (well, this saves you escaping \ with \\).

And don’t forget the combinations of raw string literals with the above prefixes and optionally the standard s-suffix, as well: LR”(C:\Path\To\Connie)”, UR”(C:\Path\To\Connie)”, LR”(C:\Path\To\Connie)”s, UR”(C:\Path\To\Connie)”s, and more!

Oh, and in addition to the standard std::string class, and other standard std::basic_string-based typedefs (e.g. std::wstring, std::u16string, std::u32string, etc.), there are platform/library specific string classes, like ATL/MFC’s CString, CStringA and CStringW. And Qt brings QString to the table. And wxWidgets does the same with its wxString.

Wow! And I would not be surprised if I missed some other string variation out 😊

P.S. With all this string variety (maybe too much…), what about adding to the C++ Standard Library some convenient functions for at least common string operations like trimming spaces and converting strings to upper case and lower case? All in all, C++ does already have rocket-science stuff like Bessel functions in its Standard Library. While, back in the old MFC days, CString already offered convenient methods like Trim, MakeLower and MakeUpper, just to name a few.

Sample slide: Introducing the std::string class
Sample slide: Introducing the std::string class

If you want to learn modern C++ from scratch, in a fun and interesting way, with engaging slides and demo code, please check out my course!

20 Replies to “How Many Strings Does C++ Have?”

    1. @Marvin: I’m not familiar with these Python string types/variants, but I believe Python’s string variants are much less than C++’s.

    2. There is only one string type in Python (3 — py2 had two)

      b”something” produces a bytes object, which is not a string (though it’s a lot like the old py2 string…)

      u”something” is a “unicode” string, but that’s a do-nothing legacy from py2 — under py3, the u is ignored: it’s allowed only so that the same code could run under py2 and py3.

      r”something” is a “raw” string literal — which means the escapes are ignored — the result is still a plain old string type.

      “‘something'” is just a plain old string that happens to have two single-quote characters in it.

      If only C++ provided a string type that was nearly as powerful and flexible as Python’s! (that is:

      Fully Unicode, but only stores as many bytes as required — e.g. if all the characters are ascii, it only uses one byte per char.

      Fairly complete set of text manipulations.

      Hmm — that’s it, but that’s a lot!

  1. Where will you put all the localization information that is necessary to make case conversion work?

    The classic killer of MakeUpper was ß from german, where the upper case version was SS (yes, two capital S’s, but they have had a spelling reform since then so nowadays it is ẞ)
    The classic killer of MakeLower is i and I from turkish, where the other case versions are İ and ı. Read the Wikipedia article . Yes, internationalization is hard.

    1. @MF: The fact that converting a character to upper case requires more space (like in your German example) is not a problem. In fact, a proper C++ string class should be able to resize itself as appropriate. I’m not familiar with the other example you made, but if there are several options, reasonable defaults could be given, and probably additional option flags could be specified to customize the conversion process.

  2. What about _bstr_t? In many years of C++ COM programming, _bstr_t was very useful for encapsulating and converting between wchar_t and char.

  3. Standard C++ has many string literal types. of which only one is in use in 99.9% of cases.

    There is only one std string template. Which can be defined into several std string types.

    std::string_view template definition should be the primary vehicle for a string literal. and string instances too. it assures value semantics and leads to simple and fast code.

    B.Stroustrup: “Be clever only if you really have to…”

    auto best_and_simplest = “string literal as a std string view”sv ;

    Minimize the use of std string as much as possible.

    1. There are many contexts in which std::[w]string_view cannot be used. For example, when dealing with legacy C code or C-interface APIs, including Windows Win32 C APIs; e.g. Windows C APIs usually expect C-style null-terminated strings; but, as far as I know, std::[w]string_view doesn’t guarantee null termination.

    1. It’s interesting that I cited the exact same methods in a recent private conversation about this article with a friend of mine. Anyway, they weren’t available when I wrote this article. Furthermore, I think waiting for C++23 for a simple contains() method is kind of too much.
      In addition, methods to convert to upper case, to lower case, to trim spaces, to split a string (e.g. using white spaces or other separators) would all come in very handy.

    1. GCC’s previous CoW std::string implementation was not standard conformant and has been deprecated in favor of a non-CoW implementation.

Leave a Reply

Your email address will not be published. Required fields are marked *