Re: 2nd Try: Extracting data from Word 2003 using XSLT



G'day "vasdeep" <vasdeep@xxxxxxxxxxxxxxxxxxxxxxxxx>,

You should be using the schema defined tags for hidden text, however,
the user can choose to view hidden text which will then expose it.

Steve Hudson - Word Heretic

steve from wordheretic.com (Email replies require payment)
Without prejudice


vasdeep reckoned:

>Could some one please respond.....
>
>"vasdeep" wrote:
>
>> Sorry, my earlier email was incomplete.
>>
>> I meant to ask...
>> Could someone please help me with these issues?
>>
>> Thanks in Advance.
>>
>> vasdeep
>>
>>
>> "vasdeep" wrote:
>>
>> > I have the following requirement.
>> > 1. User chooses to download the contents captured in our system as a word
>> > document. I have written an XSLT that converts the system
>> > data [in XML format] into WordML. To identify the contents of this
>> > document, I
>> > also create hidden tags as part of this Transformation.
>> > Using Word property, "w:visible", I add these tags . For example it
>> > could be something like <vasuclause> </vasuclause>
>> > These tags will not be visible to the end-user when viewed from MS
>> > Word 2003. The contents between these tags will be the text that the user
>> > will see and
>> > it represents an entity in our system. So, there will be many occurrences of
>> > this pattern corresponding to the entities in the system.
>> > But when Word 2003 tries to open the WordML created this way, it will
>> > try to process the tag "<vasuclause>" and error out. To avoid this, as part
>> > of my
>> > transformation I escape the "<" and ">" tags, resulting in "%lt;vasuclause>"
>> > and
>> > "%lt;/vasuclause>"
>> >
>> > Question 1: Is this approach right?
>> >
>> > The user can now open the document in Word 2003.
>> >
>> > 2. The user can make changes to the document and upload it back to the
>> > system.
>> > I have written another XSLT that reads the contents of the document
>> > and extracts the text including the formatting.
>> > i.e. if the user has used bold, italics, underline, lists, paragraphs
>> > etc, all that information is captured as part of this transformation. The
>> > XSLT has relevant templates to convert the run properties for "bold",
>> > "italics", "lists", etc to
>> > their HTML equivalents. The XSLT outputs an XML document containing the text
>> > with relevant formatting instructions.
>> > As part of this, I need to read the tags that I had inserted
>> > "vasuclause" and using that identify each entity.
>> > Question 2: The XML output will be something
>> > like
>> > <vasuclause>This is the first paragraph with <b> bold </b>
>> > </vasuclause>
>> > The second stage of this process is to use a SAX parser to read
>> > the XML and insert the data into the database. But the XML output is not
>> > correct. It has
>> > the tags escaped. Is there a way to resolve this.
>> >
>> >
>> > The examples available in mdsn talk about using XSD for defining the Data
>> > Definition for the word document and defining blocks where the user can
>> > provide input. Then the user can save the file by choosing "Save Data Only"
>> > option. But this does not work for my requirement. Doing so saves only the
>> > "data(text)" and
>> > loses the formatting. I need to capture both the data and the formatting.
>> >
>> > Also these examples talk about user providing data into specific input
>> > blocks. In my case, the user can add new paragraph texts. Hence, I have not
>> > been able
>> > to use the suggested solution. Is there something that I'm missing?
>> >
>> > Is there a way to achieve what I wish to accomplish? Will the procedure I
>> > have explained above work? Is there a better way.

.



Relevant Pages

  • Re: Advice needed: Text editor with tags.
    ... With Docbook, theoretically, you are writing purely the ... > your desired formatting is quite a mammoth undertaking, ... come up with a set of tags for the XML file so I can try out in Tcl. ... All editing functions in the text widget available. ...
    (comp.lang.tcl)
  • Re: String manipulation and HTML TAGS
    ... Formatting tags must be preserved ... >> This will HTML encode the whole of the post (including scripts). ... >>> I've developed a website using ASP ...
    (microsoft.public.scripting.vbscript)
  • Re: Index entry format not consistent
    ... Go to one or two of the text paragraphs containing index tags that are ... Choose Edit>Clear>Clear Formatting. ... > index entries with the same index level from marked text with the same ...
    (microsoft.public.mac.office.word)
  • RE: Extracting data from Word 2003 using XSLT
    ... The contents between these tags will be the text that the user ... >> and extracts the text including the formatting. ... The XSLT outputs an XML document containing the text ... >> The examples available in mdsn talk about using XSD for defining the Data ...
    (microsoft.public.word.vba.general)
  • Re: Automation: Inserting HTML text into Word 2000
    ... text then you'll have to strip off the tags and apply the formatting as ... expecting Windows to magically realise that the tags are formatting rather ... > I'm fine inserting text, images etc at bookmarks within a Word template ... > plain text rather than HTML. ...
    (microsoft.public.word.vba.general)