Re: Byte Array to String



Thanks for your reply,

Yes, for text file, if we doesn't get the correct encoding/charset, the
retrieved text will mismatch the original characters.

For your scenario, I think VBA may use the default system locale to
encoding the characters. You can also try
"Encoding.Default" as the parameter in the SreamReader's constructor.
"Encoding.Default" means the current system ANSI codepage. If this still
not work, I think the VBA is producing the file like a binary format
one(doesn't use a consistent encoding for the entire file) and thus, using
binary read mode to decode it individually should be reasonable.

Anyway, if you have any further questions on this, welcome to post here.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.



--------------------
Reply-To: "AG" <NOSPAMa-giam@xxxxxxxxxxxxxxxxx>
From: "AG" <NOSPAMa-giam@xxxxxxxxxxxxxxxxx>
References: <eMdm3uLLIHA.4948@xxxxxxxxxxxxxxxxxxxx>
<wUDfEONLIHA.7800@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Byte Array to String
Date: Thu, 22 Nov 2007 09:25:49 -0500


Thanks for the reply Steven.

I ended up reading as byte and converting myself because text reading mode
(streamreader) produced the wrong characters for the extended ASCII
characters.

Perhaps a bit more of an explanation.

The file is created by an Access application using VBA, as a method of
exporting some database data.
Since the data may contain all the usual record and field separators like
crlf, commas, tabs, quotes, etc., the extended ASCII chars are used as
record and field separators.

It is created using the Open for append method and data added via the
Print
method, as follows. This method can not be changed, as it is in use in too
many locations.

Dim strRecord as string
strRecord = "field1data" & Chr(128) & "field2data" & Chr(128) &
"field3data"
& Chr(129)
Open <thefile> For Append As #1
Print #1, strRecord
Close #1

As you can see, there is no BOM.

The file is easily opened and read in VBA using Open For Binary:
Dim strFileData as String
Open <thefile> For Binary As #1
strFileData = space(FileLen(<thefile>)
Get #1, , strFileData
Close #1

This all works fine in VBA. Now, I would like to read the file using .NET
framework.
While my method of using Chr() on each byte works, it would seem that
there
should be a similar simple method in .NET to get the file contents without
looping through each byte.
According to the help file, Chr uses the Encoding class to return the
appropriate character, so isn't there a method in the Encoding class that
would perform the operation on the entire stream?

--

AG
Email: discussATadhdataDOTcom
"Steven Cheng[MSFT]" <stcheng@xxxxxxxxxxxxxxxxxxxx> wrote in message
news:wUDfEONLIHA.7800@xxxxxxxxxxxxxxxxxxxxxxxxx
Hi AG,

If the file contains character that exceed the ASCII char code scope(and
those chars are stored correctly), that means the file's content is not
stored as ASCII encoding(single byte charset).

Generally speaking, if you're reading a text file(which means its content
are character text rather than unreadable binary content), you should use
text reading mode to read them(rather than read them as byte and convert
them your self).

And to read file as text mode, you need to know what is the
encoding/charset of the text file's content. this info is needed when you
try reading the file in Text Mode. For example, you can use the
"StreamReader" class in .net to read file in text mode as below:

=================
StreamReader sr = new StreamReader("inputfile.txt", Encoding.UTF8);
string content = sr.ReadToEnd();

sr.Close();
================

or you can also let the StreamReader to determine the encoding
automatically (through file's BOM). But BOM(Byte Order mark) is not
existent in text file:

======================
StreamReader sr1 = new StreamReader("inputfile.txt", true);

string content1 = sr1.ReadToEnd();

sr1.Close();
=================

for your case, I think the file's encoding is likely not UTF8, and if you
use UTF8 to decode the byte, you'll probably get wrong character.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



This posting is provided "AS IS" with no warranties, and confers no
rights.

--------------------
Reply-To: "AG" <NOSPAMa-giam@xxxxxxxxxxxxxxxxx>
From: "AG" <NOSPAMa-giam@xxxxxxxxxxxxxxxxx>
Subject: Byte Array to String
Date: Wed, 21 Nov 2007 22:56:55 -0500

I have a file that contains ASCII and Extended ASCII characters.
I need to get the file contents into a string, but the Extended ASCII
characters (dec 128 and 129) are being changed to dec 63.

I have tried several methods, but here is the one I thought would have
worked.

Dim strReturn As String
Dim arBytes() As Byte
arBytes = System.IO.File.ReadAllBytes(<myfile>)
strReturn = System.Text.Encoding.UTF8.GetString(arBytes)

When I examine strReturn, I find that the chars that should be chr(128)
and
chr(129) are all chr(63).

The only thing I could get to work is

Dim strReturn As String = String.Empty
Dim arBytes() As Byte
Dim sB As New StringBuilder
Dim byT As Byte

arBytes = System.IO.File.ReadAllBytes(strPathFile)
For Each byT In arBytes
sB.Append(Chr(byT))
Next
strReturn = sB.ToString

Can anyone offer an explanation, and/or a better method?

--

AG
Email: discussATadhdataDOTcom








.



Relevant Pages

  • Re: Byte Array to String
    ... retrieved text will mismatch the original characters. ... encoding the characters. ... Dim strFileData as String ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Send string to IP address
    ... "Plain hex" implies something formatted as text, but doesn't answer the question of encoding. ... There's no "just" as far as "an ASCII string" is concerned. ... Characters are not bytes and bytes are not characters. ... Normally you'd create the Writer once at the same time as you create the underlying stream, rather than every time you write some text, obviously. ...
    (comp.lang.java.programmer)
  • Re: Character semantics for filenames (was: win32 reading wide filenames (unicode))
    ... Now file name is stored in utf8 format. ... it doesn't make any difference whether the string is internally ... DO WITH CHARACTERS ABOVE "\xFF". ... encoding to perl strings by readdir and from perl strings to the OS ...
    (comp.lang.perl.misc)
  • urwid with multi-byte encoded and bidirectional text?
    ... I would like to support whatever encoding the user likes. ... *new* line translation format would have to support characters that are ... N bytes in the string and M columns wide when displayed, ...
    (comp.lang.python)
  • Re: Reading Russian text in Word2003 VBA code
    ... Likely, your string contains the correct cyrillic string, but the control ... The VBA editor, immediate window or a MsgBox will display question marks ... for characters that aren't in the code page they are set up for. ... Whether the database can deal with Unicode, or the mechanism that you use ...
    (microsoft.public.word.vba.userforms)

Loading