Re: replacing text data in a binary file

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Jay B. Harlow [MVP - Outlook] (Jay_Harlow_MVP_at_msn.com)
Date: 06/09/04


Date: Wed, 9 Jun 2004 09:30:15 -0500

Adam,
For a binary file, it really depends on the encoding, which I would try to
avoid! If you were using UTF8 and it worked then there is 1 char per byte.
If you need UTF16, then you have (16bit) Unicode itself & there are 2 bytes
per char (in the file), there is also a UTF32 ;-). In other words do not
confuse how a string is represented in memory and how text is encoded in a
file. Binary files could actually have multiple encodings of text. Consider
a file that stores the code page & the length of a string, followed by the
code page encoded value of the string... Speaking of which if you binary
file stores the length of the string, removing the "g:\" may cause problems
;-)

If you start using ReadChar & WriteChar you are back to translating to Text.

Assuming your file is not EBCDIC & you are not using any extended character
(an umlated a for example "ä"). A single byte in your file will contain a
single character. EBCDIC you still have 1 byte per single character however
its bit representation are different, extended characters (ANSI code pages)
are single character per byte, but bytes 128 to 255 have different bit
representations...

You can then use Asc or AscW to convert a char into a integer/byte.

> > value = input.ReadByte()
> > Do Until value = -1
                if value = AscW("G"c)
                    ' found a G, look for a :
                End if
> > output.WriteByte(CByte(value))
> > value = input.ReadByte()
> > Loop

For information on Unicode and other character sets see:

http://www.yoda.arachsys.com/csharp/unicode.html

A couple other articles that may help:
http://www.yoda.arachsys.com/csharp/debuggingunicode.html

Hope this helps
Jay

"Adam J. Schaff" <aschaff@cascocdev.com> wrote in message
news:evjzbWiTEHA.1472@TK2MSFTNGP12.phx.gbl...
> Jay,
>
> Thanks for the code. I looked at using the binary reader, but couldn't
> figure out how to tell when it reaches EOF. Had no idea about the -1
trick.
> That said, I'm still not sure what the code would look like to test one
byte
> at a time for a sequence of 6 characters, particularly if those characters
> are stored as 2 bytes each. Hmm, I suppose there is a ReadChar method, but
> if I use that instead, I couldn't use the -1 trick. Also, how would I
write
> it back to the output file if I use ReadChar instead of ReadByte? Is there
a
> WriteChar, and will it cause problems if some of the data I'm reading with
> ReadChar isn't really a char? Ack! I'm just too new to all this. I don't
> even understand the use of cbyte in your code (isn't it a byte already?).
> Well, at least I have food for thought. Thanks again for the advice. I can
> see I have a lot to learn.
>
> Even if I do stay with the stream reader, I'll take your worries to heart
> and will add code to backup the file before I play with it. That's just
good
> common sense.
>
>
> "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP@msn.com> wrote in message
> news:uAuSP4hTEHA.1984@TK2MSFTNGP12.phx.gbl...
> > Adam,
> > If you are editing a binary file I would recommend opening the file with
a
> > BinaryReader or even just FileStream directly.
> >
> > > Dim sr As New StreamReader(fs, System.Text.Encoding.UTF8)
> >
> > If you use a StreamReader, you are actually converting the file into
Text
> > (8bit bytes to 16 bit chars) then converting the file back into Binary.
> >
> > My two concerns with using a StreamReader are:
> > 1. depending on what the binary file actually contains (an exe for
> example)
> > removing or adding characters may actually corrupt the file, as the file
> > contains length & offset fields...
> > 2. Loosing certain bytes, ones that are not translated nicely from
> arbitrary
> > bytes back & forth to Unicode. For example using Encoding.ASCII will
trash
> > your file as ASCII is 7 bit, you will loose all the high order bits. I
> > suspect with UTF8 you will be OK, however it just does not feel right...
> >
> > Here is a simple program that will copy a file one byte at a time, the
> trick
> > is going to be modify it so it looks for "g:\" one byte at a time and
> skips
> > those bytes, only if all three are found... Especially if you want to
> ensure
> > they are prefaced with "file//".
> >
> > Dim input As New FileStream("Only Fools.fpl", FileMode.Open)
> > Dim output As New FileStream("Only Fools2.fpl", FileMode.Create)
> > Dim value As Integer
> > value = input.ReadByte()
> > Do Until value = -1
> > output.WriteByte(CByte(value))
> > value = input.ReadByte()
> > Loop
> > input.Close()
> > output.Close()
> >
> > Hope this helps
> > Jay
> >
> > "Adam J. Schaff" <abjunk@adelphia.net> wrote in message
> > news:OkxC3hdTEHA.1508@TK2MSFTNGP11.phx.gbl...
> > > I am writing a quick program to edit a binary file that contains file
> > paths
> > > (amongst other things). If I look at the files in notepad, they look
> like:
> > >
> > >
> <gibberish>file//g:\pathtofile1<gibberish>file//g:\pathtofile2<gibberish>
> > > etc.
> > >
> > > I want to remove the "g:\" from the file paths. I wrote a console app
> that
> > > successfully reads the file and writes a duplicate of it, but fails
for
> > some
> > > reason to do the "replacing" of the "g:\". The code follows with a
note
> > > showing the line that is not working. When I look at the "s" variable
in
> > > break mode, I see that VB does not show the entire file contents, even
> > > though when I write "s" to the second file stream, the entire original
> > file
> > > is duplicated. I suppose this is because the file content isn't
intended
> > to
> > > be interpreted as a string (its binary after all). It is probably
> hitting
> > > some unfriendly bytes that it can't interpret for string operations
like
> > > Replace. If that's the case, maybe I need to interact with it a
> character
> > at
> > > a time, although I'm not sure how I would do a replace that way. Any
> ideas
> > > or code would be greatly appreciated. As you can tell, I don't do much
> > > binary file I/O.
> > >
> > > Sub Main()
> > > 'read the binary file into a string var
> > > Dim fs As New FileStream("C:\Source\Sample and
> > > Demo\Foobar\EditFpl\Only Fools.fpl", FileMode.Open, FileAccess.Read)
> > > Dim sr As New StreamReader(fs, System.Text.Encoding.UTF8)
> > > Dim s As String = sr.ReadToEnd
> > > sr.Close()
> > > fs.Close()
> > >
> > > 'remove the hard-coded drive letter specification
> > > s.Replace("file://G:\", "file://") 'THIS LINE DOES NOT WORK
> > >
> > > 'write a new binary file with my changes
> > > Dim fs2 As New FileStream("C:\Source\Sample and
> > > Demo\Foobar\EditFpl\Only Fools2.fpl", FileMode.CreateNew,
> > FileAccess.Write)
> > > Dim sw As New StreamWriter(fs2, System.Text.Encoding.UTF8)
> > > sw.Write(s)
> > >
> > > sw.Close()
> > > fs2.Close()
> > > End Sub
> > >
> > >
> >
> >
>
>



Relevant Pages

  • Re: search and replace in binary file
    ... character, tab, etc. Looks like I can't attach files here, so best I can ... Dim objNetwork, objWMIService, objComputer, objFSO, objTextFile ... Dim strComputer, strUser, strUserProfilePath, strNextLine, strFileText ... 'OPTIONAL METHOD FOR DEALING WITH BINARY FILE, ...
    (microsoft.public.scripting.vbscript)
  • Re: TMA Assembler?
    ... That's the funny ... Difference between a binary file and a character file... ... and be able to criticize people basing your argumentation on something ...
    (alt.lang.asm)
  • search and replace in binary file
    ... character, tab, etc. Looks like I can't attach files here, so best I can do ... Dim objNetwork, objWMIService, objComputer, objFSO, objTextFile ... Dim strComputer, strUser, strUserProfilePath, strNextLine, strFileText ... 'OPTIONAL METHOD FOR DEALING WITH BINARY FILE, ...
    (microsoft.public.scripting.vbscript)
  • Re: differance between binary file and ascii file
    ... Plz tell the differance between binary file and ascii ... you are in big trouble now. ... This newsgroup is for C language related questions. ... A character is not some arbitrary size. ...
    (comp.lang.c)
  • Re: [Lit.] Buffer overruns
    ... When type char is signed, ... > character in a binary file. ... I.e. chars from files never exist in a signed flavour, always unsigned, ...
    (sci.crypt)