Re: Timing Rescordset
- From: hgeron <hgeron@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 4 Sep 2006 19:34:01 -0700
Thanks Mark, but I am not concerned anymore about the text node coming as a
child of the P node. I just read data into arrays, and deincrement the array
pointer
for those "exposed nodes". I even tried Altova's "XML reader to Access
tables." and it skipped over the node.text. (it would read only one... and
think it ended with the P text).
The problem was that it took hours to read a big xml file. I would read
1000 nodes, then use a recordset to store them. Bob suggested using a
parameter query,
but the problem is reading the XML. It might take 5 minutes to read 1000
lines of XML, and only a secord to process the same 1000 lines with a
recordset.
All though the recursive xml fuction works great, it is painfully slow.
(The levels
don't get any deeper than 6). I don't see how the function could be more
efficient, I suppose it is just the Access basic slowness that's the problem.
Harrell
--
hgeron
"Mark J. McGinty" wrote:
.
"hgeron" <hgeron@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:95B813D8-F312-49E0-8893-8A2D2565995B@xxxxxxxxxxxxxxxx
Yes, I remember you said to check the node type, but I didn't see how how
that would help. Once again the problem is one type, like...
<P id="51" desc="109" name="PT">5403.84 3798.21 94.27</P>
The recursive function I am using first check to see if it has something,
if
so it gets each attribute. Then it calls itself again and gets another
node,
it has no attributes but it has node text, so it gets that, then calls
itself
again.
You said this is two nodes... Ok... I see that the ">" seems to end the
node, but
the text part gives me no node name
I'm guessing that Bob found that node_text is exposed as a child node the
same way I did: by setting a break point and browsing through the XML object
in the debugger. If you're of open mind, pure heart and free spirit, you
can learn more in a 30-60 minute debugging session than you would from a 9
week class! :-)
Anyway, here's a potential strategy (that I'm going to leave to you to
test):
Instead of checking to see if the current node's type is node_text, check to
see if childNodes.length happens to be 1; if it is, exanine the first/only
child node; if it's node_text, deal with it at the current level. Good rule
of thumb: avoid levels of recursion that lack substantial advantage.
As for your database storage plan... well let's just say that if I was still
in my '24/7 readiness to argue the virtues of 5NF' phase, it would likely
cause a few nightmares. :-)
You will end up with a severe number of tables -- what will you do if two
XML authors use the same node name in a different heirarchy? Plus, if you
create tables on the fly, to represent XML construction, you will end-up
needing to rediscover all of it from your schema, since your tables' schema
will be completely unpredictable.
Give this alternative some thought:
Two tables:
[XMLDocuments]
[id] int IDENTITY(1,1) NOT NULL
, [FileName] varchar(255) NOT NULL
, [SourceDomain] varchar(255) NULL,
, [DateAcqired] datetime
, [HasChildren] bit NOT NULL
, [documentElementID] int NOT NULL
-- id of row in Nodes that represents this top-level node
--(maybe some other strategic things from the document object, like DTD)
-- if you want to make it easy to verify when it comes time to
reconstruct...
-- to really do it right
, [XMLSource] text NULL
[Elements]
[id] int IDENTITY(1,1) NOT NULL
, [OwnerID] int NULL -- id of owner document
, [ParentID] int NULL -- id of parent, null if documentElement
, [Type] varchar((32) -- obj type (node, attribute, etc)
, [NodeName] varchar(255) NOT NULL
, [DateAcquired] datetime
, [HasChildNodes] bit NOT NULL
, [HasAttributes] bit NOT NULL
-- etc
, [xml] text NULL -- only if you have terabytes of HDD to squander
-- you get the picture, yeah?
That way your schema would be predictable, you could query for things like
number of documents on file, node and attribute counts, averages,
maximims... a dizzying array of esoteric [though likely less than useful]
stats will be at your finger tips. :-)
To reconstruct, you'll have to query the db for each level, in much the same
fashion as the recursion used to de-construct it, but little tricks like
dumping all of a document's elements to a separate temp table, and/or
caching them on the client will help performance.
-Mark
but I know that it is the 3d coordinate
of a
surveyed point, and the attributes just collected were given as this
point's
ID, Description, and "name". So I put them as one record. I excpect I
will
have problems when I save changes and updates back to XML, but I am not at
that point yet. Currently I am placing everything in one table, but I
know I
will need many other tables, each Node Name will need it's own table. If
a
parameter query will not accept the node name as a parameter as a table
name,
then I will have to use
a case select for the table name, but I am discovering new node names with
new survey types. It seems that should use a recordsets, and create
tables
(as needed), and add attributes and text as I find them.
I think this is going to very time consuming...
(1) Read the XML node name, if any
(2) If I have attributes ,create a table if table of node name does not
exist.
(3) Read each attribute
(4) Store attributes in table, go back to step (1)
(5) If I have no attributes, read text
(6) Assume the text when with last node name, and store text.
(7) Go back to step (1)
Is this logic ok? How would I check the node type, other than this what I
am
doing already?
Function RecurseXMLs(Node As IXMLDOMNode, Level As Long) As Boolean
Set NodeList = Node.childNodes
If Not Node.Attributes Is Nothing Then
nodeName = Node.baseName
For Each Att In Node.Attributes
.... 'get each attribute
Next
End If
If Node.nodeType = NODE_TEXT Then
...'get node.text ...
Else
If Not NodeList Is Nothing Then
For Each ChildNode In NodeList
If RecurseXMLs(ChildNode, Level + 1) = False Then Exit
Function
Next
End If
End If
RecurseXMLs = True
End Function
- Follow-Ups:
- Re: Timing Rescordset
- From: Mark J. McGinty
- Re: Timing Rescordset
- References:
- Re: Timing Rescordset
- From: Bob Barrows [MVP]
- Re: Timing Rescordset
- From: hgeron
- Re: Timing Rescordset
- From: Bob Barrows [MVP]
- Re: Timing Rescordset
- From: hgeron
- Re: Timing Rescordset
- From: Bob Barrows [MVP]
- Re: Timing Rescordset
- From: Bob Barrows [MVP]
- Re: Timing Rescordset
- From: hgeron
- Re: Timing Rescordset
- From: Mark J. McGinty
- Re: Timing Rescordset
- Prev by Date: Re: Urge to kill computer rising; RegSvr32 error 0000001f
- Next by Date: Re: Urge to kill computer rising; RegSvr32 error 0000001f
- Previous by thread: Re: Timing Rescordset
- Next by thread: Re: Timing Rescordset
- Index(es):
Relevant Pages
|