[LC++]How to generate UTF-8 output?
Torsten Rennett
Torsten at rennett.de
Tue Nov 29 21:56:01 UTC 2005
Hi Jan,
thank you for reporting your experiences.
On Sonntag 27 November 2005 02:26, Jan Pfeifer wrote:
> I've been playing with a morphological analyser and I had many problems
> using the C++ converter facet to convert to and from wstring (UCS-4 in
> linux) and string (UTF-8). Well, not many problems, it just didn't work,
> after a long time reading and trying.
Me too ...
> My guess is that it's just not implemented correctly yet.
Yes, I agree! It's just not working out-of-the-box. I'm currently working
on a solution, cause I need this functionality.
> I went through
> many C++ books in safari (http://safari.oreilly.com/) and none gave good
> (or working) descriptions on how to handle these internationalization
> issues :(
>
> So, I'm sticking to the C functions (wcrtomb() and mbtowcr() family of
> functions) for converting. I keep all my internal data in UCS-4 (wstring
> and wchar_t) and convert back to the user's encoding just before
> printing.
Yes, this is the common proceeding: use wchar_t/wstring (==UCS-4 on Linux)
internally and convert on input/output. For the converting part I just
want to set the right facet in a locale and 'imbue()' the iostream with
it. The facet should use the X/Open iconv(3) function internally as it is
much more flexible than the wcrtomb() family of functions (some points on
this can be found here
http://www.gnu.org/software/libc/manual/html_node/Restartable-multibyte-conversion.html).
> If you have better luck, please share with us your experience :)
I think, I've found a pragmatical solution and will report on it in a few
days as I'm quite busy at the moment -- sorry.
Torsten
More information about the tuxCPProgramming
mailing list