Re: Writing Japanese or Chinese strings in a text file

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Thanks for your reply Tony.

> "Boo K.M" is right, but I suspect you need a whole lot more information
> before you can do that. For instance, what locale are you running in? If not
> a Far Eastern locale then your current ANSI code page will not be
> appropriate, and so you'll get "?" when you try to display any Far Eastern
> data.


Well. I am using a french computer on Windows XP. But I checked an
option in the locale preferences to display correctly far eastern
characters. So they are right in the excel file.

>
> Also, what character set is the Excel file stored in. If you're in a
> different locale when reading the data then you may have misread the
> character codes (i.e. what's stored in memory no longer represents the
> original characters). If the source data is in a Far Eastern DBCS (e.g.
> Shift JIS), or UTF-8, then it would be better to read it in binary mode,
> into a Byte array, and then handle the translation explicitly in your code.


Ok. The excel file is one of mine : I made a copy-paste from a chinese
web page (exactly I put with VB the string from a textarea in a chinese
web page into the value of a cell of my own excel file). And the
characters are fine on my screen in the excel file.
Then I tried something like that :

open myFileName for output as myFileNumber
print #myFileNumber,myCell.value
close myFileNumber

but the produced file just contains "?????".
I think there is a conversion in the print instruction. I think that VB
(I am using a VB5 version) converts the string in unicode automatically
(but I suppose that it is a DBCS string in the cell value).
I effectivly tried something like :

dim myString() as Byte

myString = myCell.value
(...)
print #myFileNumber, myString
(...)

but it does not work.
Since my post, I found this source code using the WideCharToMultiByte
API :

Function UTF8Encode(ByVal wText As String) As String
Dim vNeeded As Long
Dim vSize As Long
vSize = Len(wText)
vNeeded = WideCharToMultiByte(CP_UTF8, 0, StrPtr(wText), vSize, "", 0,
0, 0)
UTF8Encode = String(vNeeded, 0)
WideCharToMultiByte CP_UTF8, 0, StrPtr(wText), vSize, UTF8Encode,
vNeeded, 0, 0
End Function

I will try it soon. Do you think it should work ?

I suppose that my trouble is due to a melting between ANSI, DBCS,
Unicode and UTF-8. I suppose that my excel cell is in DBCS, and that VB
deals with Unicode strings.
If I put manually chinese characters in notepad, I have to save as
unicode format to keep these characters.
I thought it was good for me that VB converts automatically strings
into Unicode, but it seems that it is not so simple !
That is the reason why I think now that I have to convert my string
into UTF-8 as Boo K.M. said.
Am I right ?

Thanks for your help
Olivier

.



Relevant Pages

  • Re: regular expressions and the LOCALE flag
    ... "dependent on the current locale". ... Strings with the 'u' prefix are Unicode strings, not bytestrings. ... A UTF-8 string is a bytestring in which the ... ASCII: bytestring with characters in the ASCII ...
    (comp.lang.python)
  • Re: [slrn] newbie stuck some what
    ... But you'll need to adjust your locale settings from ISO-Latin to ... That can't be done based on the rightmost five characters, ... But I'm not a stranger to string ... I'm a stranger to all programming (apart from my BASIC experiments ...
    (news.software.readers)
  • REWARD: chr() not working for Chinese "Locale"
    ... I have a real stumper of an issue...I am creating a string, ... Smartphone's "Locale" setting to "English", the string is built of the ... proper individual characters representing the specified values for X. ...
    (microsoft.public.pocketpc.developer)
  • Re: Arabic or Chinese characters in a URL link give error copying
    ... > data contains Unicode characters - by definition. ... If his locale was Arabic or Chinese then I ... > character has a Unicode value outside the 8-bit range. ... then convert it to a Unicode string. ...
    (microsoft.public.vb.general.discussion)
  • Re: How to convert Infix notation to postfix notation
    ... If this is for an error message, why isn't it using stderr for its output? ... array of 15 characters, and you call this function with the limit 15 on ... Making sure that the only string I allocate and append to, ... because mulFactor in all versions must needs incorporate the functions ...
    (comp.lang.c)