The Sticky Preprocessor-Based TCHAR Model – Part 1: Introduction

If you have been doing a fair amount of Win32 programming in C++, chances are good that you have been exposed to some basic APIs like SetWindowText.

Its prototype is very simple:

BOOL SetWindowText(HWND hWnd,
                   LPCTSTR lpString);

The LPCTSTR typedef is equivalent to const TCHAR*: it basically represents a pointer to an input NUL-terminated string. The purpose of this API is to change the text of the specified window’s title bar, or the text of the control (if the hWnd parameter represents a control) using the string passed as second parameter.

But, truth be told, there’s no SetWindowText function implemented and exposed as a Win32 API!

There are actually two slightly different functions: SetWindowTextA and SetWindowTextW.

This can be easily verified spelunking inside <WinUser.h>:

WINUSERAPI
BOOL
WINAPI
SetWindowTextA(
    _In_ HWND hWnd,
    _In_opt_ LPCSTR lpString);
WINUSERAPI
BOOL
WINAPI
SetWindowTextW(
    _In_ HWND hWnd,
    _In_opt_ LPCWSTR lpString);
#ifdef UNICODE
#define SetWindowText  SetWindowTextW
#else
#define SetWindowText  SetWindowTextA
#endif // !UNICODE

Removing some “noise” (don’t get me wrong: SAL and calling conventions are important; it’s “noise” just from the particular perspective of this blog post) from the above code snippet, and substituting the LPCSTR and LPCWSTR typedefs with their longer equivalent forms, we have:

// LPCSTR == const char*
BOOL SetWindowTextA(HWND hWnd, 
                    const char* lpString);

// LPCWSTR == const wchar_t*
BOOL SetWindowTextW(HWND hWnd, 
                    const wchar_t* lpString);

So, basically, the main difference between these two functions is in the string parameter: the function with the A suffix (SetWindowTextA) expects a char-based string, instead the function with the W suffix (SetWindowTextW) expects a wchar_t-based string.

These char-based strings are commonly called “ANSI” or “MBCS” (“Multi-Byte Character Set”) strings. The “A” suffix originates from “ANSI”.

Conversely, the wchar_t-based strings are commonly called “wide” strings, or Unicode strings. And, as you can easily imagine, the “W” suffix stems from “wide”.

The ANSI/MBCS form refers to legacy strings, with lots of associated potential problems including mismatching code page mess.

The Unicode form is the “modern” one, and should be the preferred form in Windows applications written in C++. Note that, in this context, the particular Unicode encoding used is UTF-16 (with wchar_t being a UTF-16 16-bit code unit in Visual C++).

Now, let’s have a look at the last part of the aforementioned code snippet:

#ifdef UNICODE
#define SetWindowText  SetWindowTextW
#else
#define SetWindowText  SetWindowTextA
#endif

So, it’s clear that SetWindowText is just a preprocessor #define, expanded to SetWindowTextW in Unicode builds (which have been the default since VS2005!), and to SetWindowTextA in ANSI/MBCS builds (which IMHO should be considered deprecated).

The Unicode vs. ANSI/MBCS mode is controlled by the UNICODE preprocessor label.

As already written, Unicode builds have been the default since VS2005; anyway, you can change the build mode via the Visual Studio IDE,  following the path: Project Properties | Configuration Properties | General | Character Set (as described, for example, in this StackOverflow answer).

The idea of this legacy TCHAR model is to basically allow C/C++ Win32 programmers to have a single code base, using a common “generic” character type named TCHAR (instead of explicitly using char and wchar_t), and a single apparent function name (for example: SetWindowText), and have this TCHAR expanded to either char or wchar_t, and the proper corresponding A-ending or W-ending function be called, depending on the particular ANSI/MBCS or Unicode build mode setting.

In this model string literals should be decorated with TEXT or _TEXT or _T, such that in ANSI/MBCS builds those literals are expanded, for example, as in “Connie”, instead in Unicode builds an L prefix is automatically added, making it L“Connie”.

Following this TCHAR model, a SetWindowText call would appear in C++ code something like this:

SetWindowText(myWindow, TEXT("Connie"));

In ANSI/MBCS builds, SetWindowText would actually be expanded to SetWindowTextA, TEXT(“Connie”) to “Connie”, so the above statement gets transformed to:

SetWindowTextA(myWindow, "Connie");

Instead, in Unicode builds, SetWindowText is expanded to SetWindowTextW, TEXT(“Connie”) becomes L“Connie” (with the L prefix denoting a Unicode UTF-16 string literal), and the aforementioned statement becomes:

SetWindowTextW(myWindow, L"Connie");

So, given a single code base, you could switch the build mode between Unicode and ANSI/MBCS, and automatically get two different binary executables: one Unicode-enabled, and the other one using the ANSI/MBCS legacy APIs.

Well, this might have made sense in the old days of Windows, when Unicode-enabled versions of Windows (for example: Windows 2000, XP, etc.) coexisted with older Unicode-unaware versions of the OS, which didn’t implement the “W” version of the Win32 APIs. So you could build software capable of targeting both Unicode-enabled and Unicode-unaware versions of Windows, starting from a common single TCHAR-enabled code base, and just #define’ing/#undef’ing a few preprocessor macros (UNICODE and _UNICODE), more or less…

Anyway, considering that recent widespread versions of Windows, like Windows 7, are Unicode-enabled, there’s really no reason nowadays to use this legacy messy TCHAR model: just build your Windows C++ applications in Unicode.

(Bonus historical note: to simplify creating Unicode-aware applications for Windows 95 and 98, Microsoft built UNICOWS.DLL or “cows”, a.k.a. “Microsoft Layer for Unicode”, released in July 2001.)

However, this TCHAR preprocessor-based model has some nasty effects still today, as we’ll see in the next blog post.

 

2 Replies to “The Sticky Preprocessor-Based TCHAR Model – Part 1: Introduction”

Leave a Reply

Your email address will not be published. Required fields are marked *