Re: Timing Rescordset




"hgeron" <hgeron@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:334657FC-1B84-4D32-AB39-34720A030E2D@xxxxxxxxxxxxxxxx

[Since the conclusion of my post is far mor relevant than my methodology
that lead to it, I've re-ordered the paragraphs

[...] I suppose it is just the Access basic slowness that's the problem.

If it's a performance flaw in Access I'd dump that environment in a freakin
heartbeat! :-) But I doubt that's it, VBA calling COM... <gasp> Holy
sheep sh1t man, it *IS* Access! 4+ seconds instead of 120 ms, good grief!
Add 1 to the number of reasons I don't use it!

That's a performance factor of 33x, literaly turning days into months. If,
for some incredibly lame reason you're stuck with Access, I'd write a COM
object in VB -- you're practically done with it already, the code from VB
ran almost unchanged in Access VBA... (Hey, what the hell is up with that
"can't access control properties unless control has focus" crap? In the
immortal words of Metallica, "it's so useless, heh heh hey")

Anyways... color my mind blown. Do yourself a favor: port your code to SQL
Server, and leave Access/Jet in the dust, where it belongs.

You didn't comment on my concerns about your 7-step plans to dynamically
create tables... I'll assume you read it... and call it a night.

[more info included in-line below]


Thanks Mark, but I am not concerned anymore about the text node coming as
a
child of the P node. I just read data into arrays, and deincrement the
array
pointer
for those "exposed nodes". I even tried Altova's "XML reader to Access
tables." and it skipped over the node.text. (it would read only one...
and
think it ended with the P text).

The problem was that it took hours to read a big xml file. I would read
1000 nodes, then use a recordset to store them. Bob suggested using a
parameter query,
but the problem is reading the XML. It might take 5 minutes to read 1000
lines of XML, and only a secord to process the same 1000 lines with a
recordset.

[analysis and methodology, left in for the hell of it...]

Five minutes? Yikes! Just for my own sanity, I played around with some XML
files that happen to be on my system. For this test, I isolated the node
processing code in its own function, to make it easy to benchmark *just* the
job of recursing through the XML object tree. All that my ProcessNode
function did at first was count nodes, attributes and track max depth in
globals. Needless to say, ProcessNode accepts a Node object, so the
overhead time includes acquiring a reference to each node in the XML, and
acquiring a reference to the attributes collection of each node.

I tested several files, the largest was about 10MB, some 195,330 nodes
containing 114,904 attributes, max depth of 8; it took about 3 seconds to
process.

I further tested a smaller XML, 974 nodes containing 26146 attributes, Max
Depth: 4

Count nodes and atts: 16.010892122113 ms
+ node.text to variable: 43.1438772010062 ms
+ enumerate attributes: 107.79337983078 ms
+ att.value to variable: 120.599243940087 ms

So XML i/o overhead is surely incurred for every property that's copied, but
it does not seem like it could be the major time-consumer, and I'm not sure
how retrieving the same amount of data from a recordset would be appreciably
faster...

[see top of post for conclusion]

-Mark




Harrell






--
hgeron


"Mark J. McGinty" wrote:


"hgeron" <hgeron@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:95B813D8-F312-49E0-8893-8A2D2565995B@xxxxxxxxxxxxxxxx
Yes, I remember you said to check the node type, but I didn't see how
how
that would help. Once again the problem is one type, like...

<P id="51" desc="109" name="PT">5403.84 3798.21 94.27</P>

The recursive function I am using first check to see if it has
something,
if
so it gets each attribute. Then it calls itself again and gets another
node,
it has no attributes but it has node text, so it gets that, then calls
itself
again.

You said this is two nodes... Ok... I see that the ">" seems to end the
node, but
the text part gives me no node name

I'm guessing that Bob found that node_text is exposed as a child node the
same way I did: by setting a break point and browsing through the XML
object
in the debugger. If you're of open mind, pure heart and free spirit, you
can learn more in a 30-60 minute debugging session than you would from a
9
week class! :-)

Anyway, here's a potential strategy (that I'm going to leave to you to
test):

Instead of checking to see if the current node's type is node_text, check
to
see if childNodes.length happens to be 1; if it is, exanine the
first/only
child node; if it's node_text, deal with it at the current level. Good
rule
of thumb: avoid levels of recursion that lack substantial advantage.

As for your database storage plan... well let's just say that if I was
still
in my '24/7 readiness to argue the virtues of 5NF' phase, it would likely
cause a few nightmares. :-)

You will end up with a severe number of tables -- what will you do if two
XML authors use the same node name in a different heirarchy? Plus, if
you
create tables on the fly, to represent XML construction, you will end-up
needing to rediscover all of it from your schema, since your tables'
schema
will be completely unpredictable.

Give this alternative some thought:

Two tables:

[XMLDocuments]
[id] int IDENTITY(1,1) NOT NULL
, [FileName] varchar(255) NOT NULL
, [SourceDomain] varchar(255) NULL,
, [DateAcqired] datetime
, [HasChildren] bit NOT NULL
, [documentElementID] int NOT NULL
-- id of row in Nodes that represents this top-level node

--(maybe some other strategic things from the document object, like
DTD)

-- if you want to make it easy to verify when it comes time to
reconstruct...
-- to really do it right
, [XMLSource] text NULL

[Elements]
[id] int IDENTITY(1,1) NOT NULL
, [OwnerID] int NULL -- id of owner document
, [ParentID] int NULL -- id of parent, null if documentElement
, [Type] varchar((32) -- obj type (node, attribute, etc)
, [NodeName] varchar(255) NOT NULL
, [DateAcquired] datetime
, [HasChildNodes] bit NOT NULL
, [HasAttributes] bit NOT NULL
-- etc
, [xml] text NULL -- only if you have terabytes of HDD to squander

-- you get the picture, yeah?

That way your schema would be predictable, you could query for things
like
number of documents on file, node and attribute counts, averages,
maximims... a dizzying array of esoteric [though likely less than useful]
stats will be at your finger tips. :-)

To reconstruct, you'll have to query the db for each level, in much the
same
fashion as the recursion used to de-construct it, but little tricks like
dumping all of a document's elements to a separate temp table, and/or
caching them on the client will help performance.


-Mark



but I know that it is the 3d coordinate
of a
surveyed point, and the attributes just collected were given as this
point's
ID, Description, and "name". So I put them as one record. I excpect I
will
have problems when I save changes and updates back to XML, but I am not
at
that point yet. Currently I am placing everything in one table, but I
know I
will need many other tables, each Node Name will need it's own table.
If
a
parameter query will not accept the node name as a parameter as a table
name,
then I will have to use
a case select for the table name, but I am discovering new node names
with
new survey types. It seems that should use a recordsets, and create
tables
(as needed), and add attributes and text as I find them.

I think this is going to very time consuming...
(1) Read the XML node name, if any
(2) If I have attributes ,create a table if table of node name does not
exist.
(3) Read each attribute
(4) Store attributes in table, go back to step (1)
(5) If I have no attributes, read text
(6) Assume the text when with last node name, and store text.
(7) Go back to step (1)

Is this logic ok? How would I check the node type, other than this what
I
am
doing already?

Function RecurseXMLs(Node As IXMLDOMNode, Level As Long) As Boolean
Set NodeList = Node.childNodes
If Not Node.Attributes Is Nothing Then
nodeName = Node.baseName
For Each Att In Node.Attributes
.... 'get each attribute
Next
End If
If Node.nodeType = NODE_TEXT Then
...'get node.text ...
Else
If Not NodeList Is Nothing Then
For Each ChildNode In NodeList
If RecurseXMLs(ChildNode, Level + 1) = False Then Exit
Function
Next
End If
End If
RecurseXMLs = True
End Function








.



Relevant Pages

  • Re: Timing Rescordset
    ... of reading an xml file and storing the data into tables. ... not storing the data in a recordset. ... I don't see how a parameter query would help. ...
    (microsoft.public.data.ado)
  • Re: Timing Rescordset
    ... The recursive function I am using first check to see if it has something, ... avoid levels of recursion that lack substantial advantage. ... XML authors use the same node name in a different heirarchy? ... To reconstruct, you'll have to query the db for each level, in much the same ...
    (microsoft.public.data.ado)
  • Re: Date Format Conversion?
    ... I didn't say customer sales and contact information, ... >And I won't have to go throw hundreds of spreadsheets to find the ... >much with XML; i prefer CSV or TSV or something else; even binary ... it what you need to do is recursion. ...
    (microsoft.public.excel)
  • Vista IE7 Recussion Error
    ... When you call this xml file in XP or FireFox using IE6 you get the ... template recursion. ... When you call this xml file in VISTA using IE7 on a linovo X60s Intel ... the stack overflow occurred starting again at Level 1076. ...
    (comp.text.xml)
  • Re: Timing Rescordset
    ... I just read data into arrays, ... The problem was that it took hours to read a big xml file. ... I'm guessing that Bob found that node_text is exposed as a child node the ... avoid levels of recursion that lack substantial advantage. ...
    (microsoft.public.data.ado)

Loading