Re: mail merge http-sourced data




"Peter Jamieson" <pjj@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:eYO5Wb90JHA.1096@xxxxxxxxxxxxxxxxxxxxxxx
An aside, the open data source dialog includes "*.htm, *.html, *.asp" as
a supported type -- any idea what format it's expecting for that? I
tried plain text and HTML table, but got no love.

Word can indeed use a table in an HTML format file as a data source. By
default it uses the Jet/ACE OLE DB provider and the HTML IISAM, but it may
also try its internal text converter in certain cicumstances and you may
be able to use the same syntax as you were using with the .csv to do that
too.

Ah... to clarify I only tried to make this work via Word's UI (thinking that
it *had* to be a supported data source over HTTP, and hoping the good old UI
could be used to construct an example.) CSV is a tiny bit easier to
generate, not to mention considerably less bloated,

I can't say I've explored this in depth but the only experiments I have
done are with very plain handcrafted HTML, with nothing much except
<table>, <tr> and <td> tags (the <html> and <body> tags appear to be
optional). I've never tried with a .asp generating the HTML and cannot do
so right now. Nor have I tried .mht (which as far as I can tell is
actually some sort of MIME-encoded HTML)

..MHT is just HTML, extended to permit embedding (using MIME multi-part
convention) contents of files that are ordinarily rendered by resolving
references to external files, like images, scripts and CSS -- a single-file
way to store a document that is ordinarily rendered from multiple files (and
a less-quirky alternative to the magically named directory scheme used by IE
Save As -> html.)

That extension seems a little out of place in its subset of extensions, IIRC
the browser is only supposed to load MHT files from the local file system...
somehow the prospect of using one to store merge data defies my imagination.


some stuff from one of my previous messages...

<<
I'm not sure you can use the text file provider + SCHEMA.INI for that -
MS's documentation suggests that the text file provider and the HTML
provider are the same thing, but I think you have to specify different
extended properties in yor connection string, i.e.

Extended Properties=""HTML Import;HDR=YES;"";"
or something like
Extended Properties=&quot;HTML Import;HDR=YES;&quot;
in a .odc file.

(The engine type, 112, is different from the text file engine type, 96,
but in this case it seems that you need the extended properties rather
than the engine type).


If the file is also encoded usig a particular character set, I don't
know how you would specify that. I was never sure how international
character set stuff was handled in earlier HTML versions anyway.

We explicitly use UTF-8 to work-around some quirks that cropped-up circa
Server 2003, but the content is all US English.


FWIW
1. the header row can be defined as
<tr><th>field1</th><th>field2</th><th>etc.</th></tr>
or
<tr><td>field1</td><td>field2</td><td>etc.</td></tr>

2. If you have more than one table, the provider uses the name in any
<caption> tag in the table if there is one, but for "uncaptioned" tables
it seems to name the tables

table
table1
table2

etc.

You may need to name the table in the .odc or in the SQLStatement
parameter of the OpenDataSource call

(e.g.
SELECT * FROM [table]
)


As far as I can see, this all works with simple examples but I've never
used it "for real". You appear to need a .odc when the data gets more
complex (perhaps if there is more than one table or you named a table
with a caption), but you may be able to specify the .htm file as the
Name parameter and put everything else in the Connection parameter (and
possibly the SQLStatement paramter) in simpler cases.

Nope, the end output resolves in SQL Server as a single derived table-valued
rowset. (Complex joins across many tables often underlie it, but the end
result is easily represented as CSV.)


Addendum: [I probably should just delete this paragraph but I can't
resist...] Ya know, if Word opened mail merge data connections in a way
that was read-only, dangerous SQL would be a non-issue -- oh wait, I
forgot, current MS data clients are only capable of restricting
writability at the cursor level, "read-only" connections are still free
to execute DML that writes data... But still, the inherent and absolute
non-writability of CSV text delivered via HTTP (no freakin less) ought to
factor in... Ah screw, clearly they have a vested interest in "security
via beating the user over the head with incessant popup warnings," who am
I to question that? :-)

Yeah, I try to resist those sort of observations because unless you can
connect to the right person in MS at the right moment, nothing will ever
happen. That said, MS seems to respond to all reports of security flaws by
"fixing" them, but
a. in many cases, these appear to be "lawyers' fixes", which IMO allow
someone to claim that any problems that arise are someone else's fault,
but actually encourage an environment where users routinely dismiss
warning messages

That is EXACTLY what I've been on about since my first hands-on with Vista!
Users become desensitized by warnings that are effectively injected into
their work flow. Having dismissed such warnings by the dozens [or more] on
a daily basis without visible consequences, users only want to understand
such warnings well enough to know which button closes them the fastest at a
quick glance. The nuisance factor quickly erodes any end-user concern for
security awareness; result: the system becomes *more* vulnerable to dire
risks, because users can't distinguish them from all the minimal-to-nil
risks that are forcibly brought to their attention.

btw, "lawyer fixes" is a great term -- in fact I used it in a meeting
yesterday. (Let me know where to send the royalties.) :-) It dove-tails
nicely with my own coinage, "resolved via disclaimer."


b. I have never understood it is Word's responsibility to prevent access
to other data files. It seems to me that it is the responsibility of the
operating system and applications such as SQL Server to provide facilities
to allow data providers to define access to data, and for users and system
buiders to use them. And of course both Windows and SQL Server do provide
such facilities - it's just that it seems that Word cannot be left to rely
solely on them.

I didn't mean to imply that Word should be ultimately responsible... take
2...

To me that warning triggered by use of the SELECT * construct infers worry
over a potential SQL Injection vector -- specifically regarding which I had
two distinct points (that got mangled together, while tripping over unstated
baseline assumptions.)

The first was that some data sources that might be referenced via SELECT
statement are so absolutely and incontravertably read-only, that any data
access client callers should be able to recognize this, and forego all
concerns about SQL statement safety, accordingly. CSV over HTTP was my case
in point -- there is no SQL construction that could possibly threaten that
data source, ever! Warnings in this case are pointless. Data sources like
SQLOLEDB or SQLNCLI are of course another matter entirely, which leads to my
second point:

Originally I was fingering all the various and sundry data client access
layers, and their seemingly universal inability to connect to DBMS in a
truly/definitively read-only fashion. Take for instance an ADODB.Connection
object, with its Mode property set to adReadOnly, that will happily execute
UPDATE and DELETE statements all day long, limited only by user permissions.

But yesterday, as I thought through the grand scheme at some length (had to
drive for 4 hours, lots of time to ponder) it occurred to me that the
missing piece would need to reside in the DB Server. What's missing is a
way for a client to connect with all permissions other than SELECT
masked-out/negated by the server. Currentlly the only way to open a truly
read-only connection is to connect using a db login context that has no
write permissions -- but that is hardly a convenient scheme, and it
completely undermines using a trusted connection to avoid exposing
credentials in connection strings.

If a data access client caller could be *guaranteed* that its strictly
read-only intentions would be respected/enforced by the server, regardless
of any write permissions that may be held by the current login context...
then all of those callers would cease to incur risk of damage to data, no
matter what SQL they may execute, intentionally or otherwise.

In short, just because my SQL login can write data in any table, does not
mean I want every data-enabled app to connect with my full set of
permissions. Because that makes each of those apps separately responsible
for preventing any abuse of anything my login is permitted to do. How
unfortunate, here in the year 2009: precisely that is currently the defacto
standard.


c. I can just about understand why things such as DATABASE and INCLUDETEXT
fields - which that could suck lots of data into a document without the
user even realising - were nobbled a few years ago to prevent the
resulting data from being distributed by botnets etc., but
- even in those cases, it's not completely obvious why it's /Word's/
responsibility to prevent the data being accessed.
- that would also seem to apply to Word 2007 documents which could
potentially contain loads of Custom XML data that the user never sees
either. No-one seems to have spent much time worrying about that.

I guess in the end, Microsoft had to assume that there are simply too many
systems out there with data that has not been properly secured and decided
to impose a "nanny state" solution. But I really don't know.

I don't know either... but it's difficult to escape how increasingly
comfortable MS has become with implementing "safety" mechanisms that are as
intensely annoying -- by design, no less -- as they are inescapable; that
are often neither well-understood, not highly effective.


In any case, thanks again for the help and your perspective. I enjoyed the
exchange! :-)


-Mark McGinty

http://www.deprecatethis.com
http://www.thinkset.com





--
Peter Jamieson
http://tips.pjmsn.me.uk

"Mark McGinty" <mmcginty@xxxxxxxxxxxxxxx> wrote in message
news:eFXTXI80JHA.3780@xxxxxxxxxxxxxxxxxxxxxxx
Peter,

Thank you very much for the informative response (you far-exceeded my
expectations!) :-) That registry option does indeed dispense with all
the prompts! Very cool! :-)

Oddly enough we do have the capacity to implement registry changes -- we
currently establish IE trust and adjust settings for the trusted zone on
client systems by asking them to run an executable that directly writes
the registry. (The client-specific domain name is incorporated into the
name of the .exe as a sort-of chicken parameter, to make a single version
of the code able to handle any client domain, by merely copying it to a
different file name.) Without that our current Office automation + FSO
scheme would be doomed from the start.

Ironically that little trust coersion mechanism was one of the things we
were hoping to forego with this change -- but this option is surely the
lesser of two evils. If we could do away with permitting Script ActiveX
not marked as safe [for trusted zone], and write-access to the client
file system as requirements I'd call it major progress. More, even
without the registry change, it does still function (as opposed to our
current scheme.) Dispensing with the annoying prompts would be icing on
the cake, rather than lack of registry changes being a show-stopper by
default. More still, the change is configuration-agnostic, a simple .reg
file would do it.

An aside, the open data source dialog includes "*.htm, *.html, *.asp" as
a supported type -- any idea what format it's expecting for that? I
tried plain text and HTML table, but got no love. I figured, sure an
html file could be local, but .asp, for all practical purposes, would be
a useless option if not via HTTP. I assumed this option almost insured
workability of HTTP as a transport... but needless to say, I was
disappointed.

In any case, your reply was very helpful, thanks again!!


-Mark


Addendum: [I probably should just delete this paragraph but I can't
resist...] Ya know, if Word opened mail merge data connections in a way
that was read-only, dangerous SQL would be a non-issue -- oh wait, I
forgot, current MS data clients are only capable of restricting
writability at the cursor level, "read-only" connections are still free
to execute DML that writes data... But still, the inherent and absolute
non-writability of CSV text delivered via HTTP (no freakin less) ought to
factor in... Ah screw, clearly they have a vested interest in "security
via beating the user over the head with incessant popup warnings," who am
I to question that? :-)


"Peter Jamieson" <pjj@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:%23sVKI960JHA.3328@xxxxxxxxxxxxxxxxxxxxxxx
You did well to connect to a data source that's an HTTP URL - FWIW, I
had never found a way to do it within the Word UI or using VBA
OpenDataSource, but specifying the URL in the XML works here too.

Unfortunately, that seems to be about as good as it gets. The following
may tell you something, but it's hard to see how you could actually
achieve your objective.

At least one of the prompts is probably the SQL-related prompt that
requires a registry fix on the user's machine:

http://support.microsoft.com/kb/825765

So that's not something you can modify on behalf of your site users
(uness they are all actually on your own corporate network, in which
case I suppose it is at least feasible).

You used to be able to avoid that prompt using RTF and removing the SQL,
but
a. there was a time when Word would discard any mailmerge info. in RTF
format files
b. Microsoft seemed to close off that possibility in an update a year
or so ago.
c. I don't know whether the equivalent approach ever worked in XML.

However, I don't know what other messages you are seeing. Here, the most
significant difference in behaviour is to do with whether the URL where
the .csv is located is trusted by Internet Explorer. If it isn't, Word
won't connect. If it is, Word will connect, but my experience with Word
2007/Vista suggests that it is always actually dealing with a downloaded
copy "behind the scenes" as you suggest. And again, unless you have
control over what URLs your users' system trust, there is no simple fix
for that.

Even the new Word 2007 approach of using content controls and putting
all the data in a custom XML data store isn't workable unless you can
guarantee that your users can download and execute a complete VSTO-type
Word document with the .NET code to point to each record in turn and
produce tear-off copies of the document.

Best I can do, sorry.

Peter Jamieson

http://tips.pjmsn.me.uk

Mark McGinty wrote:
env: Word 2003+

Greets,

I'm trying to do something that, on the surface, seemed pretty basic: I
need to serve-up a Word mail-merge document (as XML) over HTTP, that is
already linked to data that is also served over HTTP.

The closest I have gotten is to link .csv data by setting the
attributes (defined using XPath)
w:wordDocument/w:docPr/w:mailMerge/w:query@w:val
and w:wordDocument/w:docPr/w:mailMerge/w:dataSource@w:val to a URL.
(w:query@w:val must be prefixed with "SELECT * FROM"; oddly, both must
be set.)

This actually does work, except that it prompts annoyingly, 2 to 3
times. The first time I could accept (the prompt references the URL)
but the "extra" prompts make the functionality look dumb (they
apparently reference some locally cached alias -- from a user
standpoint, it looks really lame.)

To summarize, we want to provide a Word mail merge all pre-prepared, no
user intervention required, as a feature of a web site. We don't want
to expose any SQL Servers to the public Internet (for obvious reasons)
and we absolutely do NOT want to use Jet (not explicitly, anyway.) Our
current solution (which had to support Word 2000) involves use of
Office automation objects and the scripting file system object --
difficult to support to say the least.

At one point it looked to me like there was a way to embed the merge
data inside the [xml] document itself, though I can't find a way to
make that happen via Word's UI. (Otoh, I couldn't get Word to persist
a URL-based data source via UI either, and that nominally works,
soooo...)

Any suggestions appreciated.


-Mark McGinty





.



Relevant Pages

  • Re: Page Inheritance
    ... BTW, I tried specifying the class name of a page (ASPX file), then put that same name as the Inherits value in another page, but I got an error something about it could not load that class. ... I guess in your case the output is the same but depends on SQL statement. ... see how would your HTML mix if let say Page1 has HTML and Page2 has HTML and Page2 inherits from Page1. ...
    (microsoft.public.dotnet.framework.aspnet)
  • IBM Informix Web DataBlade: SQL injection
    ... SQL code is executed under the uid that webdriver connects as. ... This query is vulnerable to SQL ... processed HTML. ...
    (Bugtraq)
  • Re: Converting Perl Web Report to Python
    ... by allowing the user to pass in an SQL query parameter to the sql ...
    (comp.lang.python)
  • IBM Informix Web DataBlade: Auto-decoding HTML entities
    ... IBM Informix Web DataBlade: Auto-decoding HTML entities ... IDS and WDB versions seem to be irrelevant). ... Malicious user may insert SQL code in form input and have it ... $'ed it should thus be safe to use it in an SQL query, ...
    (Bugtraq)
  • Re: [OT] Re: Asking if elements in struct arre zero
    ... >> connection back to somewhere as it was attempting to render the ... >> Why would you even use a client application that wouldn't allow ... You are apparently ignoring the inevitability of a USENET on which HTML ...
    (comp.lang.c)