Re: Determining if a string is Unicode

From: Bonj (benjtaylor)
Date: 12/02/04


Date: Thu, 2 Dec 2004 06:56:59 -0000

A VB string is a BSTR, which is always unicode.
Unicode IS always a mystery, but try creating a new INF file on the NT based
system, open it in notepad, paste the old INF file's text into this new
file, and save it *on the NT based system*. What happens when you try to
read that file with VB on the NT based system?

"Jerry West" <jw@comcast.net> wrote in message
news:10qs4q2blgnv7db@news.supernews.com...
>I have a strange issue I can't seem to get a handle on. I'm reading in an
>INF file like so:
>
> Public Function mP_GetFileText(sFileName As String, Optional bNoLock As
> Boolean = True, Optional nStart As Long = 1) As String
>
> Dim sText As String
>
> Dim i As Integer
>
> i% = FreeFile
>
> If gFSO.FileExists(sFileName$) Then
>
> If bNoLock Then Open sFileName$ For Binary Access Read Lock Write
> As i% Else Open sFileName$ For Binary Access Read As i%
>
> sText$ = String$(LOF(i%), 0)
> Get i%, nStart&, sText$
>
> Close i%
>
> mP_GetFileText = sText$
>
> End If
>
> End Function
>
> If the INF file is read from a NT based system it appears to be in
> Unicode. Viewing the string in the watch window has every other char as a
> Null. If I then perform this operation on it it then it appears as a
> "normal" (non-Unicode) string:
>
> sString$ = StrConv(sString, vbFromUnicode)
>
> Now, if I read in the INF file from a 9x based computer the string does
> not appear to be in Unicode. Further, if I perform the StrConv operation
> on this string it changes the entire string to all question marks.
>
> This left me with attempting to determine whether or not the INF file read
> is in Unicode or not. First I tried examining the VarType() value. However
> for either type of string returned the value was always an 8 (8 = string).
> Clearly both types of strings are NOT the same so this seems odd to me. I
> clearly cannot perform the StrConv function on the already "normal" string
> w/o changing it to all question marks. I then thought I'd try to read the
> files in using a different method like so:
>
> mP_GetFileText = gFSO.OpenTextFile(sFileName$, ForReading,
> TristateFalse).ReadAll
>
> This also failed. No matter what Tristate value I would use the string
> returned was always Null chars IF the INF file being read was on a remote
> 9x or NT system. It would work OK when reading local files. Finally, I
> tried creating a function that would return True if the string was Unicode
> like so:
>
> Dim l As Long
>
> Dim sa() As Byte
>
> On Error GoTo ErrHandler
>
> sa = sString$
>
> If (UBound(v) > -1) Then
>
> For l& = 1 To UBound(sa) Step 2
>
> If (sa(l&) <> 0) Then Exit For
>
> Next l&
>
> mP_IsStringNonEnglishUnicode = (l& < UBound(sa))
>
> End If
>
> This also would not properly detect that the "normal" string was
> non-Unicode (if that is what it is).
>
> Has anyone else seen this type of issue before? Is there a way to detect
> the difference between these two string types? I'm not even certain that
> the string read from the NT based system is in Unicode at this point. Does
> anyone have any comments to share on this situation?
>
> Thanks!
>
> JW
>
>
>
>



Relevant Pages

  • Re: Export table to UTF-8 textfile
    ... For a list of the character set strings that is known by a system, ... I have a Access 2003 database with a table containing unicode (UTF-8) ... Function ExportToCsv(sRecordset As String) As Boolean ... Dim RS As Recordset ...
    (microsoft.public.access.externaldata)
  • Re: Write Line problem when writing simplified chinese characters
    ... the cell vlaue is translated into ... > Below I have adapted your code to write in unicode. ... > Dim fso As New FileSystemObject ... > Dim strFieldValue As String ...
    (microsoft.public.excel.programming)
  • Re: Arabic or Chinese characters in a URL link give error copying
    ... the active ANSI character set, ... Arabic/Chinese then the associated "wide" Unicode characters will have been ... Function ContainsWideChars(ByRef inString As String) As Boolean ... Dim iCh As Integer ...
    (microsoft.public.vb.general.discussion)
  • Re: Determining if a string is Unicode
    ... there's nothing magic about Unicode. ... where each character occupies 2 bytes, as opposed to a Single-Byte Character ... You could load up a string with rubbish, ... > INF file like so: ...
    (microsoft.public.vb.general.discussion)
  • Re: Arabic characters gives ASCII code 63
    ... The only problem is that you are looking at the ASCII/ANSI values i.e. assuming that each character is represented as a number between 0 and 255. ... This is hidden from the developer - the length of a 5 character string is still 5 but it's still 10 bytes. ... all you need to do is get the unicode value for each character rather than the ANSI number. ... Dim CellValue As String ...
    (microsoft.public.excel.programming)