Re: Reading text file characters
From: Don (dsarvas_at_yahoo.com)
Date: 12/28/04
- Next message: Andy Kasotia: "View Text File using VB6"
- Previous message: Kou Vang: "Re: Array Reference"
- In reply to: Jim Mack: "Re: Reading text file characters"
- Next in thread: Bob O`Bob: "Re: Reading text file characters"
- Reply: Bob O`Bob: "Re: Reading text file characters"
- Reply: Jim Mack: "Re: Reading text file characters"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 28 Dec 2004 19:32:35 GMT
On Tue, 28 Dec 2004 12:47:45 -0500, "Jim Mack" <jmack@mdxi.nospam.com>
wrote:
>Don wrote:
>> If anyone is still with this thread, I did a hex dump suggested by
>> Duane and got the following that appears AFTER what should be the end
>> of the file - or at least what appears as the end of the file when the
>> file is displayed in a text editor or word processor.
>>
>> What doesn't show up in the numbers below is the last line of the hex
>> dump which is just a double underscore, but that seems to show up with
>> all files so I'm assuming it just marks the end up the dump?
>>
>> Any idea what these hex numbers represent in the text file that seem
>> to occur right at the point where the ^@ character appears which seems
>> to be the source of problems?
>>
>> 00 0d 0a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a
>> 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a
>Hex 1A is ASCII EOF. Older text editors (really old) would sometimes
>fill their file buffer with EOF characters, then overwrite with text as
>needed. When they closed the file they wrote the entire buffer, which
>left a trail of EOFs in the file. Since the first EOF marks the end, the
>extras were ignored. What created this file?
The file is created by a specialized program used by court reporters
to prepare transcripts - basically a word processing program with
specialized features to cater to their needs.. This particular
problem occurs with only one of the programs. The file is fine when
viewed and printed within the program itself, but when a text file (or
ASCII file as we call it in the industry) is created, several control
codes are scattered throughout the text file that should not be there.
I was able to identify all of them except the ^@ by identifying the
decimal value using ASC() function.
>The ^@ is the null byte you show as the first byte in the dump. It
>shouldn't cause any problems, but it depends on how you're opening and
>reading the file whether null bytes are an issue. The usual strategy
>would be to open as binary, Get the file into a byte array, then replace
>all 0 bytes with spaces.
Actually, I can read the file fine not only using the app I wrote but
using any word processor or text editor. But the control characters
are present and if the text file is printed, the control characters
print (e.g. ^A ^@, etc.) My app filters out all the control
characters. My app also eliminates the ^@ character, although I have
to admit it's not by my doing. Although I do filter out all the other
characters, just writing the file again (using the print statement)
disregards the ^@ so the final output is clean, although I think that
character is actually never read in, but I'm not sure.
We have run into one program, ironically another court reporter
application, that will not accept the text file at all unless that ^@
is eliminated.
I had suspected this was the EOF or at least the result of an EOF
since it always occurred at the end of a file - well, almost always.
You bring up an interesting point regarding the overwriting of excess
EOF markers. Where I have seen the ^@ midway through a file, it would
usually be where there was some copying and pasting. But if that's
the case, I didn't think an EOF could actually appear midway in a file
and still allow the user to move beyond that point, which I can do.
On the other hand, we have had problems in the past where text
suddenly ends, but using a variety of maneuvering procedures (e.g.
PgDn or CTRL/END or whatever to get past that point, I have sometimes
actually reached text that at first was not visible. Now I'm
wondering if, based on your description of traling EOFs, if I'm
running into text that has somehow managed to be placed between EOFs,
which should not happen, if I understand the purpose of the EOF
correctly.
But since you identified the ^@ as a null byte, maybe when I'm
deleting it, I'm really deleting more than that - some of those EOFs,
if that's possible????
As long as I can fix these problems with my app, that may be good
enough, but I'd sure like to be able to tell the programmers who
produce this program where they can concentrate in their program to
prevent the problem from even happening in the first place.
Thanks,
Don
>
>--
>
> Jim Mack
> MicroDexterity Inc
> www.microdexterity.com
>
>
- Next message: Andy Kasotia: "View Text File using VB6"
- Previous message: Kou Vang: "Re: Array Reference"
- In reply to: Jim Mack: "Re: Reading text file characters"
- Next in thread: Bob O`Bob: "Re: Reading text file characters"
- Reply: Bob O`Bob: "Re: Reading text file characters"
- Reply: Jim Mack: "Re: Reading text file characters"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|