Conversion between Unicode UTF-16 and UTF-8 in C++/Win32

For fresh updated and richer information and modern C++ usage, please read my MSDN Magazine article (published on the 2016 September issue):

Unicode Encoding Conversions with STL Strings and Win32 APIs

New updated modern C++ code can be found here on GitHub.


Check out My Pluralsight Courses here.

 


 

 

C++ reusable code for mixed ATL/STL conversions can be found here on GitHub. Basically, ATL CString(W) stores Unicode text encoded in UTF-16, and std::string stores UTF-8-encoded text.


Code working with ATL’s CStringW/A classes and throwing exceptions via AtlThrow() can be found here on GitHub. For convenience, the core part of that code is copied below:

//////////////////////////////////////////////////////////////////////////////
//
// *** Functions to convert between Unicode UTF-8 and Unicode UTF-16 ***
//                      using ATL CStringA/W classes
//
// By Giovanni Dicanio 
//
//////////////////////////////////////////////////////////////////////////////


//----------------------------------------------------------------------------
// FUNCTION: Utf8ToUtf16
// DESC:     Converts Unicode UTF-8 text to Unicode UTF-16 (Windows default).
//----------------------------------------------------------------------------
CStringW Utf8ToUtf16(const CStringA& utf8)
{
    // Special case of empty input string
    if (utf8.IsEmpty())
    {
        // Return empty string
        return CStringW();
    }


    // "Code page" value used with MultiByteToWideChar() for UTF-8 conversion 
    const UINT codePageUtf8 = CP_UTF8;

    // Safely fails if an invalid UTF-8 character is encountered
    const DWORD flags = MB_ERR_INVALID_CHARS;

    // Get the length, in WCHARs, of the resulting UTF-16 string
    const int utf16Length = ::MultiByteToWideChar(
            codePageUtf8,       // source string is in UTF-8
            flags,              // conversion flags
            utf8.GetString(),   // source UTF-8 string
            utf8.GetLength(),   // length of source UTF-8 string, in chars
            nullptr,            // unused - no conversion done in this step
            0);                 // request size of destination buffer, in WCHARs
    if (utf16Length == 0)
    {
        // Conversion error
        AtlThrowLastWin32();
    }


    // Allocate destination buffer to store the resulting UTF-16 string
    CStringW utf16;
    WCHAR* const utf16Buffer = utf16.GetBuffer(utf16Length);
    ATLASSERT(utf16Buffer != nullptr);


    // Do the conversion from UTF-8 to UTF-16
    int result = ::MultiByteToWideChar(
            codePageUtf8,       // source string is in UTF-8
            flags,              // conversion flags
            utf8.GetString(),   // source UTF-8 string
            utf8.GetLength(),   // length of source UTF-8 string, in chars
            utf16Buffer,        // pointer to destination buffer
            utf16Length);       // size of destination buffer, in WCHARs  
    if (result == 0)
    {
        // Conversion error
        AtlThrowLastWin32();
    }

    // Don't forget to release internal CString buffer 
    // before returning the string to the caller
    utf16.ReleaseBufferSetLength(utf16Length);

    // Return resulting UTF-16 string
    return utf16;
}



//----------------------------------------------------------------------------
// FUNCTION: Utf16ToUtf8
// DESC:     Converts Unicode UTF-16 (Windows default) text to Unicode UTF-8.
//----------------------------------------------------------------------------
CStringA Utf16ToUtf8(const CStringW& utf16)
{
    // Special case of empty input string
    if (utf16.IsEmpty())
    {
        // Return empty string
        return CStringA();
    }


    // "Code page" value used with WideCharToMultiByte() for UTF-8 conversion 
    const UINT codePageUtf8 = CP_UTF8;

    // Safely fails if an invalid UTF-16 character is encountered
    const DWORD flags = WC_ERR_INVALID_CHARS;

    // Get the length, in chars, of the resulting UTF-8 string
    const int utf8Length = ::WideCharToMultiByte(
            codePageUtf8,       // convert to UTF-8
            flags,              // conversion flags
            utf16.GetString(),  // source UTF-16 string
            utf16.GetLength(),  // length of source UTF-16 string, in WCHARs
            nullptr,            // unused - no conversion required in this step
            0,                  // request size of destination buffer, in chars
            nullptr, nullptr);  // unused
    if (utf8Length == 0)
    {
        // Conversion error
        AtlThrowLastWin32();
    }


    // Allocate destination buffer to store the resulting UTF-8 string
    CStringA utf8;
    char* const utf8Buffer = utf8.GetBuffer(utf8Length);
    ATLASSERT(utf8Buffer != nullptr);


    // Do the conversion from UTF-16 to UTF-8
    int result = ::WideCharToMultiByte(
            codePageUtf8,       // convert to UTF-8
            flags,              // conversion flags
            utf16.GetString(),  // source UTF-16 string
            utf16.GetLength(),  // length of source UTF-16 string, in WCHARs
            utf8Buffer,         // pointer to destination buffer
            utf8Length,         // size of destination buffer, in chars
            nullptr, nullptr);  // unused
    if (result == 0)
    {
        // Conversion error
        AtlThrowLastWin32();
    }


    // Don't forget to release internal CString buffer 
    // before returning the string to the caller
    utf8.ReleaseBufferSetLength(utf8Length);

    // Return resulting UTF-8 string
    return utf8;
}

 

Hi!

Hi all!

My name is Giovanni Dicanio, and I’m a Microsoft Visual C++ MVP.

(I received my first VC++ MVP Award on July 1st, 2007.)

The odds are good that I’ll be writing about C++ and Windows programming (and maybe something more) on this blog.

Feel free to contact me using giovanni.dicanio at gmail.com or gdicanio at mvps.org e-mail addresses.

Thanks,
Giovanni