Detecting Unicode Space Characters

Some programmers love UTF-8 as they believe they can reuse old “ANSI” APIs with UTF-8-encoded text. UTF-8 does have some advantages (like being endian-neutral), but pretending to blindly reuse old “ANSI” APIs for UTF-8 text is not one of them.

For example: There are various space characters defined in Unicode.

You can use the iswspace function to check if a Unicode UTF-16 wide character is a white-space, e.g.:

#include <ctype.h>      // for iswspace
#include <iostream>

int main()
{
    // Let’s do a test with the punctuation space (U+2008)
    const wchar_t wch = 0x2008;

    if (iswspace(wch)) {
        std::cout << "OK.\n";
    }
}

The corresponding old “ANSI” function is isspace: Can you use it with Unicode text encoded in UTF-8? I’m open to be proven wrong, but I think that’s not possible.

 

Leave a Reply

Your email address will not be published. Required fields are marked *