Re: Need help with FSO reading/writing ASCII that might contain Unicode

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance




"Paul Randall" <paulr90@xxxxxxx> wrote in message news:OP7vLw4SJHA.5364@xxxxxxxxxxxxxxxxxxxxxxx

"Anthony Jones" <AnthonyWJones@xxxxxxxxxxxxxxxx> wrote in message news:eNm%23z%231SJHA.5268@xxxxxxxxxxxxxxxxxxxxxxx

"Paul Randall" <paulr90@xxxxxxx> wrote in message news:%23JRvDI0SJHA.3932@xxxxxxxxxxxxxxxxxxxxxxx
HI
I am fairly proficient in VBScript, but find JScript to be confusing and often difficult to understand. I would like help modifying two JScript routines.

I have a simple HTML editor which I downloaded from msdn.microsoft.com a few years ago. It is an HTA and some other files, which uses the HTA window in which it builds a custom toolbar and menu structure, JScript, and the IE DOM to allow copying text and graphics from IE windows, pasting into the editor, and doing various editing tasks. This was a sample of what HTAs and DHTML could do. It uses the Microsoft Common Dialog as a file browser/picker to open and save files, as well as a means to select fonts and colors. It uses the file system object to do the actual loading and saving of files.

The load and save functions assume that only ASCII text is involved. I wish to change those two routines to allow Unicode files to be loaded and saved.

Here are the original routines (there will be line wrap):

//SaveDocument uses the common dialog box object to display the save as dialog, then writes a textstream object from the value of the div's innerHTML property
function SaveDocument(){
//Setting CancelError to true and using try/catch allows the user to click cancel on the save as dialog without causing a script error
cDialog.CancelError=true;
try{
cDialog.Filter="HTM Files (*.htm)|*.htm|Text Files (*.txt)|*.txt"
cDialog.ShowSave();
var fso = new ActiveXObject("Scripting.FileSystemObject");
var f = fso.CreateTextFile(cDialog.filename, true);
f.write(oDiv.innerHTML);
f.Close();
sPersistValue=oDiv.innerHTML;}
catch(e){
var sCancel="true";
return sCancel;}
oDiv.focus();
}

//LoadDocument uses the common dialog box object to display the open dialog box, then reads the file and displays its contents in the div
function LoadDocument(){
//Setting CancelError to true and using try/catch allows the user to click cancel on the save as dialog without causing a script error
cDialog.CancelError=true;
try{
var answer = checkForSave();
//The user has clicked yes in the modal dialog box called in the checkForSave function
if (answer) {var sCancel = SaveDocument();
//The user has clicked cancel in the save as dialog box; exit function
if (sCancel) return;
cDialog.Filter="HTM Files (*.htm)|*.htm|Text Files (*.txt)|*.txt"
cDialog.ShowOpen();
var ForReading = 1;
var fso = new ActiveXObject("Scripting.FileSystemObject");
var f = fso.OpenTextFile(cDialog.filename, ForReading);
var r = f.ReadAll();
f.close();
oDiv.innerHTML=r;
//This variable is used in the checkForSave function to see if there is new content in the div
sPersistValue=oDiv.innerHTML;

}
//The user has clicked no in the modal dialog box called in the checkForSave function
if (answer==false)
{cDialog.Filter="HTM Files (*.htm)|*.htm|Text Files (*.txt)|*.txt"
cDialog.ShowOpen();
var ForReading = 1;
var fso = new ActiveXObject("Scripting.FileSystemObject");
var f = fso.OpenTextFile(cDialog.filename, ForReading);
var r = f.ReadAll();
f.close();
oDiv.innerHTML=r;
sPersistValue=oDiv.innerHTML;
}
oDiv.focus();
}
catch(e){
var sCancel="true";
return sCancel;}
}

I believe the problem with writing Unicode is in these two lines:

var f = fso.CreateTextFile(cDialog.filename, true);
f.write(oDiv.innerHTML);

Perhaps the text of oDiv.innerHTML could be checked for whether it contains any Unicode characters with a regular expression like ^[\u0000-\u00ff], and if it does contain Unicode, instantiate the text stream object:
var f = fso.CreateTextFile(cDialog.filename, true, True);

I believe the problem with reading Unicode is in these two lines:

var f = fso.OpenTextFile(cDialog.filename, ForReading);
var r = f.ReadAll();

Unicode files usually start with a two-byte Byte Order Mark that is either 0xFFFE or the reverse, 0xFEFF

If either of these patterns is present, then it is likely that the file is Unicode and that a byte in the file will have the value 0x00. The text stream ReadAll method ignores all characters of an ASCII textstream starting with the first 0x00 it finds; it doesn't return that Unicode text and likely does not return all the contents of the file, while also not reporting that there was a problem.

After the var r = f.ReadAll(); statement, perhaps logic could check whether at least two bytes were read, and if so, whether the first two bytes were the 0xFFFE or 0xFEFF BOM, and if so, close the text stream, get another text stream object for reading in Unicode mode, and do the ReadAll method on the Unicode text stream.

I'm hoping someone has the time and will look at this and if feasible, modify the two routines so I can test the changes.


File system object does not automatically determine the type of encoding in the file, it has no understanding of the BOM sequence that other tools such as notepad are familiar with.

Use CreateTextFile(s, true, true) so that you always create a Unicode file.

Use OpenTextFile(cDialog.filename, ForReading, false, -1)

(The -1 means open file as unicode).


--
Anthony Jones - MVP ASP/ASP.NET

Thanks for the suggestion.

I agree that the parameters used to instantiate the text stream object determine how the FSO reads and writes files. This does make the FSO smart enough to insert the BOM automatically when the text stream creates or overwrites a file, and it makes the FSO automatically not include the BOM with the data when it reads the file.

I understand that your solution would work and even I could easily implement it in JScript, but it would be wasteful of disk space. It would mean that every file I save with this program would use two bytes per character, even though 95 percent of the files hold only single-byte characters.


The another alternative is UTF-8 encoding, but FSO doesn't handle that, ADODB.Stream does if thats available to you which I suspect it will be.

However are you sure the size of the files is really worth worring about? Just how big will a typical file be?


--
Anthony Jones - MVP ASP/ASP.NET

.



Relevant Pages