Re: Why std::cout stop working when output MBCS string?



Dancefire wrote:
1 locale loc(".936"); // Code page 936
2 locale::global(loc);
3 cout.imbue(loc);
4 string test("\xba\xba\xd7\xd6");// MBCS string "汉字" in GBK
encoding.
5 cout << "Before output MBCS string." << endl;
6 cout << test << endl;
7 cout << "After output MBCS String." << endl;

I think you had a misunderstanding here: the locale or rather its codepage
affect the output(!) and not how internal char sequences are interpreted.
IOW, you pass it a string and it converts them to CP 936 above. However, I
don't think there is a way to represent the string you want in an internal
char sequence. The latter is a limitation (or abstraction) of C++
IOStreams, which assume that internal characters always consist of exactly
one element, it doesn't apply to different APIs.

Now, what you should do is that you simply use wchar_t instead of char. IOW,
you use things like std::wcout, std::wstring, std::wfstream etc. Note that
you then use Unicode (UCS2 to be precise) internally.

If I want the line 7 be output, I have to add cout.clear() between
line 6 and line 7.

Yep, conversion fails and thus output fails and the streamstate gets its
fail bit set.

I tried to modify the code to make it work. I found, if I remove the
line 2, the code will work properly. But it doesn't make any sense.

I can't quote chapter&verse, but I think that std::cout fetches its locale
from the global locale on first output operation. However, this still
doesn't make sense, both lines 2 and 3 should then have the same effect on
std::cout.

If I'm wrong, what is the correct procedure for output MBCS to console
in STL way?

Two things here:
1. This has zero to do with the STL! What you mean is the C++ standard
library, the STL doesn't contain any IOStreams or localization.
2. The way how this works might differ from system to system, the strings
passed to locale() are generally not portable. A portable way would be to
write a codecvt facet that converts internal wchar_t to external CP936, but
even that would have to be adjusted slightly for different sizes, encodings
and signednesses of wchar_t.

BTW: What I personally would rather do is use UTF-8 and convert any output
with a dedicated converter.

Uli

.



Relevant Pages

  • Re: UTF8: cgi ist staerker als ich
    ... use locale ist sogar äusserst gefährlich und unberechenbar. ... Also vergiss Locales und verwende den Unicode-Support von Perl. ... ist dass Du nicht 256 verschiedene Zeichen ... Beim encode kodierst Du Zeichen entsprechend einem bestimmten Encoding ...
    (de.comp.lang.perl.cgi)
  • Re: UTF8: cgi ist staerker als ich
    ... use encoding "utf8" ... use locale ist sogar äusserst gefährlich und unberechenbar. ... dass Latin-1 weder hebräische noch kyrillische ... hab' ich schon festgestellt - wenn ich die cgi header auf utf-8 ...
    (de.comp.lang.perl.cgi)
  • Re: [kde-linux] Encoding questions
    ... encoding; if it were encoding issue, you wouldn't be seeing blocks, but some ... many times when I copy and paste stuff from konqueror into kate and then ... Also, you might want to check if the system locale, as KDE session sees it, ...
    (KDE)
  • Re: character encoding
    ... On new Etch installs, UTF-8 is the default. ... on you locale (I'm not sure if a system upgraded to Etch would be UTF-8 ... that application will try to read it as a certain encoding -- how is ... specific format (binary executables are in ELF format on ...
    (Debian-User)
  • Re: LANG, locale, unicode, setup.py and Debian packaging
    ... encoding, and compute that encoding with locale.getpreferredencoding. ... the locale returns something like "ANSI" and I ... If I access the filename it throws a unicodeDecodeError. ... can't know if I am testing real-world strings or crazy Tolkein strings. ...
    (comp.lang.python)

Loading