[LC++]How to generate UTF-8 output?

Chris Vine chris at cvine.freeserve.co.uk
Fri Nov 25 08:01:02 UTC 2005


On Thursday 24 November 2005 22:49, Chris Vine wrote:

[snip]

> The interesting question is what wcrtomb() assumed about the wide character
> codeset it encountered - probably it did the sensible thing and assumed a
> UCS-4 codeset if you have a 4 bit wchar_t (Linux has 4 bit wchar_t and
> Windows a 2 bit wchar_t), and that happened to match the assumption of your
> compiler in setting up the wide character string literal at compilation
> stage).  To that extent, it looks to be a matter of luck that it worked.

Actually, on further thought it wasn't luck, because the compiler was using 
the C library it knew about and both would have agreed on the wide character 
codeset used - the wide character codeset would be implementation defined, 
but correct.

This use of printf() to convert wide characters to the user's current narrow 
character locale is an interesting one I have not seen before, but is a way 
of "hardwiring" text into source code in a a way which guarantees it is 
displayed in the codeset any user happens to use for narrow characters.  
Normally this isn't an issue, as gettext() is used to convert between 
languages and this will choose the correct narrow character representation, 
but where you have a single language application, using printf() is a good 
way of catering for different narrow character codesets.  Did you get the 
example from a textbook or someone else's code?

To use C++ streams, you would probably have to imbue a code conversion facet 
into your narrow character stream to convert wide characters to narrow 
characters.  Perhaps your compiler already does this - have you tried?

Chris





More information about the tuxCPProgramming mailing list