[LC++]How to generate UTF-8 output?

Torsten Rennett Torsten at rennett.de
Tue Nov 29 21:56:01 UTC 2005


Hi Jan,

thank you for reporting your experiences.

On Sonntag 27 November 2005 02:26, Jan Pfeifer wrote:
> I've been playing with a morphological analyser and I had many problems
> using the C++ converter facet to convert to and from wstring (UCS-4 in
> linux) and string (UTF-8). Well, not many problems, it just didn't work,
> after a long time reading and trying.

Me too ... 

> My guess is that it's just not implemented correctly yet. 

Yes, I agree!  It's just not working out-of-the-box. I'm currently working 
on a solution, cause I need this functionality.

> I went through 
> many C++ books in safari (http://safari.oreilly.com/) and none gave good
> (or working) descriptions on how to handle these internationalization
> issues :(
>
> So, I'm sticking to the C functions (wcrtomb() and mbtowcr() family of
> functions) for converting. I keep all my internal data in UCS-4 (wstring
> and wchar_t) and convert back to the user's encoding just before
> printing.

Yes, this is the common proceeding: use wchar_t/wstring (==UCS-4 on Linux) 
internally and convert on input/output. For the converting part I just 
want to set the right facet in a locale and 'imbue()' the iostream with 
it. The facet should use the X/Open iconv(3) function internally as it is 
much more flexible than the wcrtomb() family of functions (some points on 
this can be found here 
http://www.gnu.org/software/libc/manual/html_node/Restartable-multibyte-conversion.html).

> If you have better luck, please share with us your experience :)

I think, I've found a pragmatical solution and will report on it in a few 
days as I'm quite busy at the moment -- sorry.

Torsten




More information about the tuxCPProgramming mailing list