[LC++]How to generate UTF-8 output?

Jan Pfeifer pfjan at yahoo.com.br
Sun Nov 27 17:04:02 UTC 2005

hi Torsten,

I've been playing with a morphological analyser and I had many problems
using the C++ converter facet to convert to and from wstring (UCS-4 in
linux) and string (UTF-8). Well, not many problems, it just didn't work,
after a long time reading and trying.

My guess is that it's just not implemented correctly yet. I went through
many C++ books in safari (http://safari.oreilly.com/) and none gave good
(or working) descriptions on how to handle these internationalization
issues :(

So, I'm sticking to the C functions (wcrtomb() and mbtowcr() family of
functions) for converting. I keep all my internal data in UCS-4 (wstring
and wchar_t) and convert back to the user's encoding just before printing.

If you have better luck, please share with us your experience :)



ps.: I created a small library based on the following template:

template <typename Target, typename Source> Target string_cast( const
Source &src );

If you are interested, let me know, I can send you a copy (LGPL license
I guess, haven't copyrighted it yet).

>When I run this program, the output is as follows:
>    torsten at linux3:~$ LANG=de_DE print2_utf8
>    loc1='de_DE'
>    Sch
>As you can see, the output stops at the german Umlaut 'ö'. This is
>independent of the setting of $LANG.
>What's wrong? 
>Who can show me a correct version of the above little C++ program?
>I'm using:
>    - Debian Sarge (stable)
>    - gcc-3.3.5
>    - libc6-2.3.2
>    - Linux Kernel 2.4.24-1-686-smp
>Thanks for any hints,
>This is the Linux C++ Programming List
>: http://lists.linux.org.au/listinfo/tuxcpprogramming List


Faça do Yahoo! sua página inicial. 

More information about the tuxCPProgramming mailing list