Re: Need help with FSO reading/writing ASCII that might contain Unicode

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance




"Anthony Jones" <AnthonyWJones@xxxxxxxxxxxxxxxx> wrote in message
news:eNm%23z%231SJHA.5268@xxxxxxxxxxxxxxxxxxxxxxx

"Paul Randall" <paulr90@xxxxxxx> wrote in message
news:%23JRvDI0SJHA.3932@xxxxxxxxxxxxxxxxxxxxxxx
HI
I am fairly proficient in VBScript, but find JScript to be confusing and
often difficult to understand. I would like help modifying two JScript
routines.

I have a simple HTML editor which I downloaded from msdn.microsoft.com a
few years ago. It is an HTA and some other files, which uses the HTA
window in which it builds a custom toolbar and menu structure, JScript,
and the IE DOM to allow copying text and graphics from IE windows,
pasting into the editor, and doing various editing tasks. This was a
sample of what HTAs and DHTML could do. It uses the Microsoft Common
Dialog as a file browser/picker to open and save files, as well as a
means to select fonts and colors. It uses the file system object to do
the actual loading and saving of files.

The load and save functions assume that only ASCII text is involved. I
wish to change those two routines to allow Unicode files to be loaded and
saved.

Here are the original routines (there will be line wrap):

//SaveDocument uses the common dialog box object to display the save as
dialog, then writes a textstream object from the value of the div's
innerHTML property
function SaveDocument(){
//Setting CancelError to true and using try/catch allows the user to
click cancel on the save as dialog without causing a script error
cDialog.CancelError=true;
try{
cDialog.Filter="HTM Files (*.htm)|*.htm|Text Files (*.txt)|*.txt"
cDialog.ShowSave();
var fso = new ActiveXObject("Scripting.FileSystemObject");
var f = fso.CreateTextFile(cDialog.filename, true);
f.write(oDiv.innerHTML);
f.Close();
sPersistValue=oDiv.innerHTML;}
catch(e){
var sCancel="true";
return sCancel;}
oDiv.focus();
}

//LoadDocument uses the common dialog box object to display the open
dialog box, then reads the file and displays its contents in the div
function LoadDocument(){
//Setting CancelError to true and using try/catch allows the user to
click cancel on the save as dialog without causing a script error
cDialog.CancelError=true;
try{
var answer = checkForSave();
//The user has clicked yes in the modal dialog box called in the
checkForSave function
if (answer) {var sCancel = SaveDocument();
//The user has clicked cancel in the save as dialog box; exit
function
if (sCancel) return;
cDialog.Filter="HTM Files (*.htm)|*.htm|Text Files (*.txt)|*.txt"
cDialog.ShowOpen();
var ForReading = 1;
var fso = new ActiveXObject("Scripting.FileSystemObject");
var f = fso.OpenTextFile(cDialog.filename, ForReading);
var r = f.ReadAll();
f.close();
oDiv.innerHTML=r;
//This variable is used in the checkForSave function to see if there
is new content in the div
sPersistValue=oDiv.innerHTML;

}
//The user has clicked no in the modal dialog box called in the
checkForSave function
if (answer==false)
{cDialog.Filter="HTM Files (*.htm)|*.htm|Text Files (*.txt)|*.txt"
cDialog.ShowOpen();
var ForReading = 1;
var fso = new ActiveXObject("Scripting.FileSystemObject");
var f = fso.OpenTextFile(cDialog.filename, ForReading);
var r = f.ReadAll();
f.close();
oDiv.innerHTML=r;
sPersistValue=oDiv.innerHTML;
}
oDiv.focus();
}
catch(e){
var sCancel="true";
return sCancel;}
}

I believe the problem with writing Unicode is in these two lines:

var f = fso.CreateTextFile(cDialog.filename, true);
f.write(oDiv.innerHTML);

Perhaps the text of oDiv.innerHTML could be checked for whether it
contains any Unicode characters with a regular expression like
^[\u0000-\u00ff], and if it does contain Unicode, instantiate the text
stream object:
var f = fso.CreateTextFile(cDialog.filename, true, True);

I believe the problem with reading Unicode is in these two lines:

var f = fso.OpenTextFile(cDialog.filename, ForReading);
var r = f.ReadAll();

Unicode files usually start with a two-byte Byte Order Mark that is
either 0xFFFE or the reverse, 0xFEFF

If either of these patterns is present, then it is likely that the file
is Unicode and that a byte in the file will have the value 0x00. The
text stream ReadAll method ignores all characters of an ASCII textstream
starting with the first 0x00 it finds; it doesn't return that Unicode
text and likely does not return all the contents of the file, while also
not reporting that there was a problem.

After the var r = f.ReadAll(); statement, perhaps logic could check
whether at least two bytes were read, and if so, whether the first two
bytes were the 0xFFFE or 0xFEFF BOM, and if so, close the text stream,
get another text stream object for reading in Unicode mode, and do the
ReadAll method on the Unicode text stream.

I'm hoping someone has the time and will look at this and if feasible,
modify the two routines so I can test the changes.


File system object does not automatically determine the type of encoding
in the file, it has no understanding of the BOM sequence that other tools
such as notepad are familiar with.

Use CreateTextFile(s, true, true) so that you always create a Unicode
file.

Use OpenTextFile(cDialog.filename, ForReading, false, -1)

(The -1 means open file as unicode).


--
Anthony Jones - MVP ASP/ASP.NET

Thanks for the suggestion.

I agree that the parameters used to instantiate the text stream object
determine how the FSO reads and writes files. This does make the FSO smart
enough to insert the BOM automatically when the text stream creates or
overwrites a file, and it makes the FSO automatically not include the BOM
with the data when it reads the file.

I understand that your solution would work and even I could easily implement
it in JScript, but it would be wasteful of disk space. It would mean that
every file I save with this program would use two bytes per character, even
though 95 percent of the files hold only single-byte characters.

-Paul Randall


.



Relevant Pages