Re: (repost): Very tough question, about HTML manipulation

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Julian Turner (JulianTurner_at_discussions.microsoft.com)
Date: 07/13/04


Date: Tue, 13 Jul 2004 08:48:02 -0700

Hello Welman

I can see the point you are making.

You need to be able to apply a regular expression to chunks of text, so I can see why my individual character method would not work.

Nevertheless "Mission Impossible" just elicits inspiration, so I will try out some ideas on you, which may help you go forward.

I will not comment on your RegExps of course.

Important facts/assumptions:-

1. I am assuming beneath all of this that you want to work in IE. There is a different (and perhaps more helpful) DOM Range model for Mozilla and the like conforming to W3 standards. See www.w3.org. IE has its own unique model for Text Ranges.

2. You can get the parentElement() of a range. There may be a little complexity here for inter element boundaries, but I shall ignore this (apologies).

3. That it would be acceptable to you to apply the RegExp separately to the "Bit of text in bold" and "Bit of text in LI".

<P><B>Bold [SELECTION_BEGIN]Bit of text in bold</B></P>
<UL><LI>Bit of text in LI [SELECTION_END]things</LI></UL>

This is the key issue really. If you want the RegExp to apply accross the whole selected text in one chunk, then it is certainly a more general problem, which I will not try to answer now.

If my more "special" case in 3 is acceptable to you, then I suggest the following.

The idea is to find sub-sets of the range that fall within different elements.

1. Again create a Cursor and Reference Range.

2. Collapse the Cursor to the start.

3. Get the parentElement() [the brackets are important as it is a method on the range object] of the Cursor. Remember this.

4. Expand the Cursor by a character.

5. Get the parentElement() again, and compare with the first. If it is the same element, go to 4. Else go to 7 (i.e. you are outside bold).

6. Also test again whether Cusor is still inside Reference with compareStartPoints, if not go to 7.

7. Reduce the Cursor by a character (i.e. the extra character that triggered the condition). But only if you came from 5.

8. Test if text is empty or whitespace, and go to 10 if so. I.e. ignore it.

9. Get the text, apply RegExp, and set the text on the Cursor range.

10. Collapse the Cursor to the end, expand by one character, and go to 3.

"Welman Jordan" wrote:

> Hello Julian,
>
> Thank you very much for your valuable suggestion!
> It somewhat helps....but.....
>
> Now I am posting my WHOLE tough question. The one that
> i posted before is a "simplified" one. I am gonna modify the
> text with in the document very very heavily with the Regular
> Expressions, as a result, character by character modification
> can help little in this particular scenario.
>
> Actually, I am dealing with some Unicode strings within
> the formatted text. For example, I would like to convert
> some punctuations from Western forms (",.?!") into the
> form of my language (",.?!" -> "£¬¡££¿£¡(some UNICODE
> PUNCTUATIONS"), only if they are behind the characters
> in my language, If the ",.?!" are behind /[a-zA-Z0-9]/, they
> will be left untouched.
>
> One more thing, my mother tongue is a Far East language,
> I can even not be able to use /\b/ in Regular Expressions
> to detect word boundaries....
>
> I am afraid that it is really a "Mission Impossible" now.
> Nevertheless, Julian, I still, must thank you again and
> express my appreciation to your kind help!
>
> Best Regards,
> W. Jordan
>
> "Julian Turner" <Julian Turner@discussions.microsoft.com> wrote in message
> news:ADCB8363-FEF1-4FD1-861C-B2B4049D11F4@microsoft.com...
> : I have some thoughts for you, but this is not a complete answer. It is
> based on something I am working on myself along similar lines.
> :
> : I assume that all you want to do is change to upper case.
> :
> : Important facts:-
> :
> : - A TextRange can be as small as 1 character, or even nothing (i.e. a
> cursor insertion point).
> :
> : - You can have multiple text ranges over the same text.
> :
> : - setting just one character on the "text" property of the range, I
> believe would not affect the mark-up
> :
> : Suggested method
> :
> : 1. get the TextRange from the selection object. Let us call this the
> "Cursor", which we will use to go character by character.
> :
> : 2. Create a duplicate of the TextRange using the duplicate() method. You
> will use the duplicate to keep a "Reference" of the start and end points of
> the range.
> :
> : 3. Collapse the Cursor range, using the collapse() method, to the start
> of the Reference range. This collapses it so the start and end points are
> the same.
> :
> : 3. Expand the Cursor by one character, using the expand() method, and get
> the character, i.e. Cursor.text.
> :
> : 4. Is the character is an a-z character? E.g. in Regular Expression
> terms: /[a-z]/gi.test(Cursor.text).
> :
> : 5. If so, get the character, convert it to upper case (toUpperCase()),
> and save it back in Cursor.text. This is the key point, setting just one
> character on the text property, I believe would not affect the mark-up.
> :
> : 6. Collapse the Cursor to the end of the character, using collapse.
> :
> : 7. Compare the end point of the Cursor to the end point of the Reference
> using compareStartPoints method (I think that is what it is called) which
> allows you to compare the start and end points of two ranges.
> :
> : 8. If the Cursor is not at the end of the Reference, go to 3.
> :
> : It is cumbersome, but it is granularity should help.
> :
> :
> :
> :
> : "Welman Jordan" wrote:
> :
> : > Hello,
> : >
> : > I've posted this thread on IE.scripting for a month, but
> : > nobody answered me, thus I reposted it here, I hope
> : > somebody can help.
> : >
> : >
> : > I have got a very darn tough question that I can not solve
> : > by myself. Any JavaScript gurus please help!
> : >
> : > Here's the problem:
> : >
> : > I wanna replace some user selected text in a web page,
> : > however, after the replacement, the format of the selection
> : > should be preserved if possible.
> : > -------------------------------------------
> : > For example, here's a piece of code:
> : >
> : > <P><B>Bold [SELECTION_BEGIN]text</B></P>
> : > <UL><LI>list item: item [SELECTION_END]things</LI></UL>
> : >
> : > If it is rendered on a browser, it should looks like this way:
> : > a paragraph, bold text,
> : > and then a bulleted list, an item.
> : >
> : > The selection point begins in the middle of the bold text,
> : > and spans into the list item.
> : > If I copy the selected text, I will get:
> : > text
> : > list item: item
> : >
> : > Now, I wanna do something with the script, replace the
> : > selected text from "text", and "list item: item" to uppercase
> : > "TEXT", and "LIST ITEM: ITEM". However, the "text"
> : > should still be bold, and the "list item: item" should still
> : > stay in the bulleted list. How to do so?
> : >
> : > I know that on IE5, windows.selection.createRange ()
> : > might help. However, when I finish modifying the selected
> : > text, I don't know how to put them back. Using pasteHTML
> : > won't work, since there'll be an extra <P> before "TEXT"
> : > and an extra </LI> after "ITEM". Thus the "Bold TEXT" would
> : > be broken into two lines.
> : >
> : > Somebody can think of another way else?
> : >
> : > W. Jordan
> : >
> : >
> : >
> : >
>
>
>



Relevant Pages

  • Re: Virus Symptoms ?
    ... key) and the delete key deleted all the charcaters back to the cursor. ... Also, arrow keys move, yes. ... backspace to delete the incorrect character ... and type the correction to insert. ...
    (microsoft.public.windowsxp.basics)
  • Re: column count in VB6
    ... > is supposed to return me the current, I guess current Character count ... > The .getcurrentcolumnnumber does that, but stops after I go off the ... RichTextBox if the references are changed appropriately (TextBox to ... also gives the column that the cursor is in. ...
    (microsoft.public.vb.general.discussion)
  • OT: Editor like CygnusEd on Amiga
    ... Can anyone give me a recommendation about an editor that is as close as ... the cursor character by character. ... Turbo does not scroll at all, ...
    (Fedora)
  • RE: (repost): Very tough question, about HTML manipulation
    ... get the TextRange from the selection object. ... Let us call this the "Cursor", which we will use to go character by character. ... Collapse the Cursor range, using the collapsemethod, to the start of the Reference range. ...
    (microsoft.public.scripting.jscript)
  • Re: Extract domain names out of URLs
    ... Match the regular expression below and capture its match into backreference ... Between zero and one times, as many times as possible, giving back as needed ... A character in the range between ?A? ...
    (microsoft.public.excel)