Printing Non-ASCII Characters to the Console on Windows

…A subtitle could read as: “From Greek letters to Os”.

I came across some Windows C++ source code that printed the Greek small letter pi (π, U+03C0) and capital letter sigma (Σ, U+03A3) to the console, as part of some mathematical formulae. A minimal reproducible code snippet based on the original code looks like this:

#include <stdio.h>

#define CHAR_SIGMA 227
#define CHAR_PI 228

int main()
{
    printf("Greek pi: %c\n", CHAR_PI);
    printf("Greek Sigma: %c\n", CHAR_SIGMA);
}

When executed on the original author’s PC, the above code prints the expected Greek letters correctly. However, when I run the exact same code on my PC, I got this wrong output:

Greek letters wrongly printed to the console
Greek letters wrongly printed to the console

What’s going wrong here?

The problem is that the author of that code probably looked up the codes for those Greek letters in some code page, probably code page 437.

However, on my Windows PC, the default code page for the console seems to be a different one, i.e. code page 850; in fact, in this code page (850) the codes 227 and 228 are mapped to Ò and õ (instead of Σ and π) respectively.

How to fix that?

I would suggest using Unicode instead of code pages. On Windows, the default “native” Unicode encoding is UTF-16. Using UTF-16, π (U+03C0) is encoded as the 16-bit code unit 0x03C0 ; Σ (U+03A3) is encoded as 0x03A3.

This code snippet shows UTF-16 console output in action:

#include <fcntl.h>
#include <io.h>
#include <stdio.h>

int wmain( /* int argc, wchar_t* argv[] */ )
{
    // Enable Unicode UTF-16 output to console
    _setmode(_fileno(stdout), _O_U16TEXT);

    // Use UTF-16 encoding for Greek letters
    static const wchar_t kPi = 0x03C0;
    static const wchar_t kSigma = 0x03A3;
    
    wprintf(L"Greek pi: %c\n", kPi);
    wprintf(L"Greek Sigma: %c\n", kSigma);
}
Greek letters correctly printed to the console
Greek letters correctly printed to the console

Note the use of _setmode and _O_U16TEXT to enable UTF-16 output to the console, and the associated <fcntl.h> and <io.h> additional headers. (More details on those can be found in this blog post.)

P.S. Bonus reading:

Unicode Encoding Conversions with STL Strings and Win32 APIs

 

Member Functions with Generic Parameters

Someone wanted to write a C++ class member function to operate on a generic container.

His thought process was: “Since in C++14 we can use auto with lambda parameters, why not using it also for class method parameters?

class MyClass {
  ...

  // DOESN’T work in C++14!!
  void DoSomething(auto& container) {
    ...

Unfortunately, this code doesn’t compile. In fact, in C++14 “auto” is not a valid parameter type for member functions.

Visual Studio 2015 emits the following error:

error C3533: a parameter cannot have a type that contains ‘auto’

To fix that code, we can use templates. The member function with an “auto” parameter can be written as a member function template:

class MyClass {
  ...

  //
  // Simulate:
  //
  //   void DoSomething(auto& container)
  //
  // using templates.
  //
  template <typename Container>
  void DoSomething(Container& container) {
    ...

Beginner’s Bug: What’s Wrong with Brace-Init and std::vector?

Question: I have some old C++ code that creates a std::vector containing two NUL wchar_ts:

std::vector<wchar_t> v(2, L'\0');

This code works just fine. I tried to modernize it, using C++11’s brace-initialization syntax:

// Braces {} used instead of parentheses () 
// to initialize v
std::vector<wchar_t> v{2, L'\0'};

However, now I have a bug in my new C++ code! In fact, the brace-initialized vector now contains a wchar_t having code 2, and a NUL. But I wanted two NULs! What’s wrong with my code?

Answer: The problem is that the new C++11 brace initialization syntax (more precisely: direct-list-initialization) in this vector case stomps (prevails) on the initialize-with-N-copies-of-X constructor overload that was used in the original code. So, the vector gets initialized with the content of the braced initialization list: {2, L’\0’}, i.e. a wchar_t having code 2, and a NUL.

I’m sorry: to get the expected original behavior you have to stick with the classical C++98 parentheses initialization syntax.

 

A Beginner’s Bug with C++11 Range-For Loops

This is an interesting bug that may affect C++11 novices. I’m proposing it in a Q&A form below.

Q. I have this old C++ code that uses a for loop to convert an ASCII string to upper-case:

std::string s = "Connie";

// Convert string to upper-case
for (size_t i = 0; i < s.length(); i++) {
  s[i] = std::toupper(s[i]);
}

std::cout << s << '\n';

This code works just fine and prints “CONNIE” as expected.

I tried to modernize this code, using C++11’s range-for loops. I rewrote the previous loop like this:

for (auto ch : s) {
  ch = std::toupper(ch);
}

Unfortunately, it doesn’t work: the output is still the original “Connie” string, not the expected upper-case version.

What’s wrong with that code?

A. The problem is in this line:

for (auto ch : s) {

Using this “auto ch” syntax, at each iteration step, the ch variable contains a copy of the original char in the string.

So, in the body of the range-for loop:

  ch = std::toupper(ch);

at every iteration step you are just converting to upper-case a temporary copy of the original string characters, not the original characters themselves!

What you really want to is iterating through the original string characters in place, instead of operating on temporary copies.

The “auto&” syntax (note the “&”) is what will make it work:

// Iterate through the string characters *in place*.
// NOTE the "&"!!
for (auto& ch : s) {
  ch = std::toupper(ch);
}

I wrote some simple practical advice for range-for loops in this blog post. For more details on using range-for loops, you may want to read this Q&A on StackOverflow.

 

The Regression of Flat UIs

Beauty is in the eye of the beholder, and I wholeheartedly prefer “classic” nice 3D colorful rich user interfaces to those “modern” flat bland UIs.

In other words, those “modern” flat UIs are a regression to me.

Just consider how nice is Visual Studio 2010’s UI if compared to the Visual Studio 2015’s one (you can click the screenshots to see them in full size):

Visual Studio 2010 and Windows 7 UI Style
Visual Studio 2010 and Windows 7 UI Style
Visual Studio 2015 Flat UI Style
Visual Studio 2015 Flat UI Style

Also Windows 7 icons (from the previous VS2010 screenshot) look much better to me than those dumbed-down bland icons of Window 10:

Windows 10 Flat Icons
Windows 10 Flat Icons

Don’t get me wrong: there are important improvements under the hood in Windows 10, and the Visual Studio 2015’s C++ compiler and standard libraries are better than those that ship with Visual Studio 2010, but this more recent UI look seems a regression to me.

To make everyone happy, why not just implementing a UI style theme selector, and providing both the “rich” and the “flat” styles, so that users can choose their favorite UI style?

 

Mixing Traits Classes, Range-For Iteration and decltype: A Subtle Bug

Suppose you have defined a traits class to query a property for some type. This is a common technique used in C++: for example, to get the largest value of the type int, you can call std::numeric_limits<int>::max. std::numeric_limits is a class template that is specialized for arithmetic types like int, double, etc.

In the int specialization (numeric_limits<int>), its max method returns the largest value of the type int; in the double specialization (numeric_limits<double>), its max method returns the largest value of the type double, and so on. In generic code, when you have some arithmetic type T, you can query T’s largest value calling numeric_limits<T>::max.

So, you defined some traits class, which is a class template, that exposes some public static methods to query some properties for some types. For example:

 

template <typename Type>
struct Traits
{
  static int GetSomeProperty(const Type&)
  {
    cout << "Generic"; 
    return 0;
  }
};

(Of course, in production code, traits classes can contain more than a single method, just like the numeric_limits traits class exposes several member functions.)

Then you specialize the above traits class for the std::string type:

template <>
struct Traits<string>
{
  static int GetSomeProperty(const string&)
  { 
    cout << "String"; 
    return 1; 
  }
};

This property might be the length in bytes of the input parameter, or it can be some other data associated to that parameter; the description of the particular property is unimportant here.

Then you have a function template that iterates through some generic container using a range-for loop and inside that loop, you invoke the Traits::GetSomeProperty method to query the given property of the current element:

template <typename Container>
void DoSomething(const Container & c)
{
  for (const auto& elem : c)
  {
    cout << elem << ": ";

    int x = Traits<decltype(elem)>::
      GetSomeProperty(elem);

    cout << '\n';
  }
}

Since this function operates on a generic container, you don’t know a priori the type of the elements in the container. So you think of using decltype(elem) to get the type of the current element.

Now, if you invoke that function template on a vector<string>, like this:

int main()
{
  vector<string> test{ "Hello", "Connie" };
  DoSomething(test);
}

What you would expect as output is probably:

Hello: String
Connie: String

Right? That’s because the DoSomething function template is iterating through a vector of strings, so decltype(elem) seems to be std::string, and so Traits<decltype(elem)> would pick Traits<string>, and consequently the Traits<string>::GetSomeProperty specialization would be invoked, which should print “String”.

Well, if you run this code, what you actually get is:

Hello: Generic
Connie: Generic

Why is that? Why is “Generic” printed instead of the expected “String”?

If pay attention to the range-for loop iteration:

    for (const auto& elem : c)

you’ll notice that elem’s type is actually not std::string. In fact, elem is a const reference to a std::string. That’s because of the “const auto&” iteration format, which in this particular case of vector<string> becomes “const std::string&”.

So, decltype(elem) is not std::string: it’s actually a “const std::string&”, that is: a const reference to a std::string. Since there is no Traits<T> specialization for a “const reference to a std::string” (in fact, you only have a specialization for a std::string), the generic Traits<T> class template code is picked by the compiler, so the generic Traits<T>::GetSomeProperty method is invoked, which prints “Generic”.

How can you fix this code? How to remove the “const &”, leaving only the “std::string” part?

Well, an option is to use std::decay:

decay<decltype(elem)>::type

This is std::string, without the const reference part.

A more verbose alternative to strip that off is remove_reference and remove_const (thanks Mr.STL for the tip on that one).

So, inside the range-for loop, this line will do The Right Thing (TM):

int x = 
  Traits<std::decay<decltype(elem)>::type>::
  GetSomeProperty(elem);

If the above line seems too hard to parse, it’s possible to break it into pieces for better readability, with the help of a convenient intermediate typedef:

typedef std::decay<decltype(elem)>::type
  ElemType;

int x = 
  Traits<ElemType>::GetSomeProperty(elem);

I hope this blog post will spare you some headache and debugging time if you have a range-for loop iterating with “const auto&”, and you are stuck getting the wrong (generic) traits invoked instead of the expected specialization.

Some compilable code is available here on GitHub for download and experimentation.

Enabling the StrSafe Locale Functions

The Windows SDK defines some string functions that provide special processing for buffer handling, with the goal of reducing security issues that involve buffer overruns. These functions are defined in the <StrSafe.h> header. If you are unfamiliar with them, a quick introduction can be found here.

Some of these functions include a parameter for locale information. These locale-aware StrSafe functions have an _l suffix, for example: StringCbPrintf_l.

However, if you try to use the aforementioned function in your Windows C++ code after including <StrSafe.h>, the compiler will complain with an error message like:

error C3861: ‘StringCbPrintf_l’: identifier not found

After some spelunking in the gigantic <StrSafe.h> header with the help of some search tool, you will discover that these locale-aware functions are excluded by default, and you have to explicitly enable them, #defining the preprocessor macro STRSAFE_LOCALE_FUNCTIONS.

StrSafe Locale Aware Functions Disabled by Default
StrSafe Locale-Aware Functions Disabled by Default

This doesn’t seem mentioned in the MSDN documentation for these functions (at least, I was unable to find a note about that).

I would have preferred a different policy of enabling them by default, and if for some reasons these functions would conflict with some existing code bases, those could be disabled defining a macro like STRSAFE_NO_LOCALE_FUNCTIONS (just like the existing approach of explicitly disabling functions with the STRSAFE_NO_CB_FUNCTIONS and STRSAFE_NO_CCH_FUNCTIONS macros).

 

C++ Tricky Overloads

Function and method overloading is certainly a powerful C++ feature. Overloading can help simplifying code in several contexts.

However, there are some “gotchas” and some tricky aspects of function/method overloading that must be considered, to prevent slipping on a banana peel.

For example, let’s consider a class that might represent a container of some data. This class has a couple of overloaded methods to add a new item to the container:

 

#include <iostream>  // for std::cout
#include <string>    // for std::string
using namespace std; // sample demo/test code

class Container
{
public:

  // ...
  
  void AddItem(int index, bool value)
  {
    cout << "AddItem(int, bool)\n";
  }

  void AddItem(int index, const string& value)
  {
    cout << "AddItem(int, const string&)\n";
  }
};

In particular, the AddItem() method is overloaded with a couple of variants: one taking a bool value as the last parameter, and another one taking a string value.

The following code simply adds a new string to an instance of the container class:

 

int main()
{
  Container c;
    
  string str{"Connie"};
  c.AddItem(1, str);

This works fine: the AddItem(int, const string&) overload is picked by the compiler, and – assuming the implementation works correctly – “Connie” goes into the container as a new string item.
We might think of adding another string with some innocent-looking code like this (I mean, we already have a string literal, so why bothering creating a std::string instance in the first place?):

   c.AddItem(2, "Sandy");
} // main

And here we have a problem!

In fact, if we compile and run a simple test program containing the above code fragments, we get the following output:

AddItem(int, const string&)
AddItem(int, bool)

So, as we can note, the second AddItem call (i.e. c.AddItem(2, “Sandy”)) is actually translated by the compiler as a call to the bool overload, not to the string overload (which probably almost everyone would have expected, at least at a first glance). Is this a compiler bug? Should we file a bug report?

Well, unfortunately: No! The C++ compiler is doing its job just fine: this behavior is just by design!

So, why does an apparently correct looking code like c.AddItem(2, “Sandy”) make the compiler choose the bool overload (instead of the string overload)??

To understand that, we have to talk a little bit about conversions, in particular, user-defined conversions and built-in conversions.

So, a std::string instance can be constructed from the string literal “Sandy”, considering the “const char*” std::string’s constructor overload. This is a user-defined conversion, implemented in a converting constructor.

However, a “const char*” (i.e. the type associated by the compiler to the string literal) can also be converted to a bool: this is an example of the so called pointer-to-bool conversion. And this one is a built-in standard conversion, not a user-defined conversion.

Crossroad
Crossroad

So, the C++ compiler has to choose between two possible conversions: a user-defined conversion vs. a built-in conversion. According to the C++ rules, given this choice, the built-in conversion is preferred (the rationale is that C++ considers a standard built-in conversion to be cheaper than calling a converting constructor, which implements a user-defined conversion).

Now, in all honesty, I consider it very intuitive that code like c.AddItem(2, “Sandy”) would pick the string overload. And this was just a small repro, but imagine what kind of bugs and confusions can be generated in more complex code bases! To avoid these kinds of problems, I’d give up the overloading feature in cases like this, and would simply rename methods to be more type-specific. For example, instead of an overloaded bug-prone AddItem(), I would prefer having several methods like: AddString(), AddBool(), etc.

This may appear more verbose and less elegant at a first glance, but it will certainly save lots of time and headache in terms of debugging code later.

Simple or complex
Simple or complex

The Sticky Preprocessor-Based TCHAR Model – Part 2: Where’s My Function?!?

In the previous blog post, I briefly introduced the TCHAR model. I did that not because I think that’s a quality model that should be used in modern Windows C++ applications: on the contrary, I dislike it and consider it useless nowadays. The reason why I introduced the TCHAR model is to help you understand what can be a very nasty bug in your C++ Windows projects.

So, suppose that you are building a cross-platform abstraction layer for some C++ application: in particular, you have a function that returns a string containing the path of the directory designated for temporary files, something like this:

// FILE: Utils.h

#pragma once

#include <string>

namespace Utils
{
    std::string GetTempPath();

    // ... Other functions
}

For the Windows implementation, this function is defined in terms of the GetTempPath Win32 API. In order to use that API, inside the corresponding Utils.cpp source, <Windows.h> is included:

// FILE: Utils.cpp

#include <Windows.h>
#include "Utils.h"

// Utils::GetTempPath() implementation ...

Now, suppose that you have another .cpp file, with cross-platform C++ code, that uses Utils::GetTempPath(). Note that, since this is cross-platform C++ code, <Windows.h> is not included in there. Think for example even of something as simple as:

// FILE: Main.cpp

#include <iostream>
#include "Utils.h"  // for Utils::GetTempPath()

int main()
{
    std::cout << Utils::GetTempPath() << '\n';
}

Well, this code won’t build. You’ll get a linker error, something like this:

1>Main.obj : error LNK2019: unresolved external symbol 
"class std::basic_string<char,struct std::char_traits<char>,
class std::allocator<char> > __cdecl Utils::GetTempPath(void)" 
(?GetTempPath@Utils@@YA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ) 
referenced in function _main

After removing a little bit of “noise” (including some C++ name mangling), basically the error is:

1>Main.obj : error LNK2019: unresolved external symbol “std::string Utils::GetTempPath()” referenced in function main

So, the linker is complaining about the Utils::GetTempPath() function.

Then you may start going crazy, double- and triple-checking the correct spelling of “GetTempPath” inside your Utils.h header, inside Utils.cpp, inside Main.cpp, etc. But there are no typos: GetTempPath is actually spelled correctly in every place.

Then, you try to rebuild the solution inside Visual Studio one more time, but the mysterious linker error shows up again.

What’s going on? Is this a linker bug? Time to file a bug on Connect?

Nope.

It’s just the nasty preprocessor-based TCHAR model that sneaked into our code!

Let’s try to analyze what’s happened in some details.

In this case, there are a couple of translation units to focus our attention on: one is from the Utils.cpp source file, containing the definition (implementation) of Utils::GetTempPath. The other is from the Main.cpp source file, calling the Utils::GetTempPath function (which is expected to be implemented in the former translation unit).

In the Utils.cpp’s translation unit, the <Windows.h> header is included. This header brings with it the preprocessor-based TCHAR model, discussed in the previous blog post. So, a preprocessor macro named “GetTempPath” is defined, and it is expanded to “GetTempPathW” in Unicode builds.

Think of it as an automatic search-and-replace process: before the actual compilation of C++ code begins, the preprocessor examines the source code, and automatically replaces all instances of “GetTempPath” with “GetTempPathW”. The Utils::GetTempPath function name is found and replaced as well, just like the other occurrences of “GetTempPath”. So, to the C++ compiler and linker, the actual function name for this translation unit is Utils::GetTempPathW (not Utils::GetTempPath, as written in source code!).

Now, what’s happening at the Main.cpp’s translation unit? Since here <Windows.h> was not included (directly or indirectly), the TCHAR preprocessor model didn’t kick in. So, this translation unit is genuinely expecting a Utils::GetTempPath function, just as specified in the Utils.h header. But since the Utils.cpp’s translation unit produced a Utils::GetTempPathW function (because of the TCHAR model’s preprocessor #define), the linker can’t find any definition (implementation) of Utils::GetTempPath, hence the aforementioned apparently mysterious linker error.

TCHAR Preprocessor Bug
TCHAR Preprocessor Bug

This can be a time-wasting subtle bug to spot, especially in non-trivial code bases, and especially when you don’t know about the TCHAR preprocessor model.

You should pay attention to functions and methods that have the same name of Win32 APIs, that can be subjected to this subtle TCHAR preprocessor transformation.

To fix that, an option is to #undef the TCHAR-modified definition of the identifier in Utils.h:

//
// Remove TCHAR preprocessor redefinition 
// of GetTempPath
//
#ifdef GetTempPath
#undef GetTempPath
#endif

A simple repro solution can be downloaded here from GitHub.