Re: Please advise on approach

From: Scott Morris (bogus_at_bogus.com)
Date: 09/23/04


Date: Thu, 23 Sep 2004 08:46:32 -0400

It's very difficult to answer general questions in a NG. Context usually
has direct bearing on the issues and the context is difficult to communicate
(as well as lengthy). From the short discussion, it looks like you need to
conduct a thorough analysis of the data (a good bit you've already done).
The characteristics of the data will lead (or force!) you to the appropriate
design. Just be aware of the datatype issues (some of which I mentioned)
and their potential effects on your system. One last comment - you might
want to conduct some testing involving a typically loaded machine / network
to verify that it can handle the expected volume of inserts. Better to know
this ahead of time (where you can design around it) then discover it
afterwards.

Good luck.

"Paul" <paulsmith5@hotmail.com> wrote in message
news:ca236fb1.0409230256.6a4876b0@posting.google.com...
> Hi Scott,
>
> Thanks for the reply. Apologies for any confusion arising from my use
> of the term timestamp - it does not refer to the sql data type in this
> case, rather to the datetime at which the source produced the data -
> which is provided by the source itself. Therefore the limitations in
> terms of sql server accuracy you refer to do not apply in this case.
> Limitations in terms of accuracy emanate from the sources themselves
> and I don't wish to concern myself with them at this point as much as
> to say that action has been taken to ensure the are synchronised as
> much as possible. I did take the precaution (whether right or wrong)
> of recording however the millisecond portion of the 'timestamp' to a
> separate field to maintain its integrity. I note your point on the
> clustered index/ordering thanks. I think that the order in which the
> data was produced by the source can be determined from the data itself
> - notably the timestamp. The physical insertion order does not concern
> me but perhaps if needed I could use an identity field - but if a
> single thread is involved as is the case this is a mute point (I
> think). Regarding identifying chronological order you are correct when
> you ask where can the database assist in differentiating between two
> records with the same timestamp/datetimes but I'm thinking it doesn't
> have to. I'm only interested in it differentiating between records
> with different datetimes i.e. a record with a datetime preceding
> another one will be higher up in the order. When I replay the data in
> sequence and I encounter two records with the same datetime the delay
> in displaying them will be down to how quickly they can be processed
> and when you are talking about portions of a second the result will
> often be invisible to the naked eye - not perfect but not critical
> either. As far as using a non-relational system that's out of the
> question at this point.
>
> Again thanks for the reply, I found it useful and appreciate your
> time. Perhaps I will refrain from asking such general questions in
> future - I think they can raise more questions then they answer - a
> problem hard to address through this medium.
>
> Paul
>
>
> Assume that source A
> > has data with timestamp 10:22:01 (ignoring the date portion for
simplicity)
> > and source B has data with the same timestamp (you said this was
possible).
> > They both get inserted into the DB (regardless of table) at roughly the
same
> > time. Based on data alone, it is not possible to determine which is
first.
> > What does the DB have to assist?
>
> "Scott Morris" <bogus@bogus.com> wrote in message
news:<uNMwmBNoEHA.3324@TK2MSFTNGP12.phx.gbl>...
> > First, timestamp has no relationship to date or time; the synonym is
> > rowversion. It is unique for a db and appears to be currently be
> > implemented in such a way as to be predictable. Based on the current
> > implementation, one can make comparisons (involving greater than and
less
> > than) that give correct results. However, I don't think the current
> > implementation can be safely relied on and, IIRC, MS only supports equal
/
> > not equal comparisons. Note - since you are posting in a db-related NG,
I
> > assume that your reference to timestamp is actually a reference to the
sql
> > server datatype. If this assumption is incorrect - well that's your
fault
> > for using an ambiguous term outside of the implied context.
> >
> > Second, datetime values are accurate to 3 milliseconds. If your term
> > "timestamp" refers to a datatype of this nature, is this accuracy
> > sufficient? Note that there are two aspects to accuracy. First is the
> > relative difference in accuracy between your datasource timestamp values
and
> > this datatype. The second aspect is relative to a difference in
accuracy.
> > Your timestamp data might be more accurate (e.g., 1 ms) but occurs at
> > intervals that are significantly less accurate (e.g., > 3 ms). Of
course,
> > you could use a different datatype, losing the ability to use the
builtin
> > datetime functions and creating potential ordering problems.
> >
> > Third, a table is, be definition, unordered. Your narrative assumes and
> > implies otherwise. Clustering does affect physical ordering. However,
> > there is nothing that can guarantee you can access rows in physical
> > insertion order unless that order can be determined by the data itself.
In
> > other words, you must provide a way to include the appropriate
information
> > in an order by clause. Even with a heap and a cursor, I'm not certain
that
> > there is any way to guarantee the desired order.
> >
> > Given this information, how do you intend to store the data (regardless
of
> > number of tables) in such a way that one can reconstruct the events in
> > chrono. order? There are two implications in your narrative that might
be
> > problematic. First, is that "reconstruction" implies a single thread of
> > execution. Is this valid? If not, then you might also need to record
> > additional information to differentiate the multiple streams of data
(which
> > just might be adding data at the same instant in time). The other
potential
> > pitfall is that reconstruction often implies "re-running" (e.g.,
> > reconstruction of the data in the same order/timeframe). Fear this!
> >
> > So now we're down to identifying chronological order. Assume that
source A
> > has data with timestamp 10:22:01 (ignoring the date portion for
simplicity)
> > and source B has data with the same timestamp (you said this was
possible).
> > They both get inserted into the DB (regardless of table) at roughly the
same
> > time. Based on data alone, it is not possible to determine which is
first.
> > What does the DB have to assist? You could use a timestamp for the
table
> > (or tables) and rely on the current implementation to derive order.
Somone
> > recently posted that this type of solution was working. You could use a
> > datetime column to mark each row with the datetime of insert. This
approach
> > may suffer due to accuracy limitations. Lastly, you could use a single
> > table with an identity. This should logically work (and it is what you
> > considered) but may be problematic due to contention / locking.
Hotspots
> > will definitely be something to avoid. Note that this approach is also
> > predicated on the immediate insertion of data, implying that every data
> > "event" is recorded on a one-by-one basis. This is not the best
approach in
> > terms of network usage (and does not scale well).
> >
> > You might want to investigate any potential client-side approaches as
well.
> > In a single-threaded application, you could also impose your own serial
> > numbering on the data before insertion.
> >
> > Lastly, you might want to indicate what your ultimate goal is.
Relational
> > databases are often slower than file-based approaches for data of this
type
> > (there is a lot of overhead that you may not need for recording data).
> > There are other ways to record "data" in fifo order that might later to
> > imported into a real database for analysis or reporting. If there are
no
> > real relationships in your data, then perhaps you don't need a rdbms.
> > Without knowing what you intend to do with the information, this is the
> > approach I would investigate first.
> >
> > "Paul" <paulsmith5@hotmail.com> wrote in message
> > news:ca236fb1.0409220850.2058abc9@posting.google.com...
> > > Hi,
> > >
> > > Apologies for the vague subject title but I couldn't think of a snappy
> > > title to describe what I am doing. Please bear with me as I explain...
> > >
> > > I am developing an application that records data from several
> > > different sources operating at relatively high rates (e.g. >5 times
> > > per sec). The data from the various sources is diverse, in terms of
> > > the type (e.g. data from source A may comprise of 5 parameters in
> > > binary format, whereas data from source B may comprise of a single
> > > parameter in ascii format). The one thing they have in common is a
> > > timestamp. The timestamp although unique for each source is not unique
> > > amongst all sources. For example although there will be only ever be a
> > > single row in TableA with a particular timestamp representing a
> > > particular point in time, there may also be a row in TableB with the
> > > same timestamp. In an attempt to have the writes as efficient as
> > > possible and to prevent the tables growing too big I record data from
> > > each source to different tables. These tables are created at runtime
> > > before I begin to record the data. Following a period of recording
> > > some tables may still have in excess of 500,000 rows. I now wish to
> > > reconstruct the sequence of events in chronological order, i.e.
> > > retrieve the earliest data recorded from all the sources, followed by
> > > the next and so on. I'm wondering how to approach this. I was thinking
> > > of creating a single large table into which I would insert the
> > > timestamp and ID of every row from each source as well as the table
> > > name, then ceating a clustered index on the timestamp field, so that
> > > the rows would be physically sorted in chronlogical order. This table
> > > may look like the following when complete
> > >
> > > 1 09/22/2004 16:00:00 Table1
> > > 2 09/22/2004 16:00:01 Table1
> > > 1 09/22/2004 16:00:02 Table2
> > > 3 09/22/2004 16:00:03 Table1
> > > 2 09/22/2004 16:00:03 Table2
> > > etc.
> > >
> > > This would only ever have to happen once (I appreciate it may take
> > > some time to construct) - then each time I need to I could simply move
> > > through the table row by row retrieving the ID and table name which I
> > > could then use as parameters in a query that would retrieve the data.
> > > I'd appreciate any feedback on this approach i.e. is it madness! I was
> > > wondering about views etc. but I don't know if these could help at all
> > >
> > > Thanks,
> > >
> > > Paul



Relevant Pages

  • Re: Please advise on approach
    ... of the term timestamp - it does not refer to the sql data type in this ... rather to the datetime at which the source produced the data - ... I'm only interested in it differentiating between records ... > insertion order unless that order can be determined by the data itself. ...
    (microsoft.public.sqlserver.server)
  • Re: Convert help needed desperately
    ... overflow error converting expression to data type datetime. ... Is my sql server hosed???? ... > the timestamp data type. ...
    (microsoft.public.sqlserver.programming)
  • Re: trigger updating timestamp
    ... you were scared to use a datetime because it was not accurate....how the ... heck are you going to convert the TIMESTAMP column into something that even ... > datetime wouldn't provide the desired accuracy (like not able to ... > UPDATE TablesLastUpdate ...
    (microsoft.public.sqlserver.programming)
  • Re: Please advise on approach
    ... timestamp has no relationship to date or time; ... datetime values are accurate to 3 milliseconds. ... insertion order unless that order can be determined by the data itself. ... pitfall is that reconstruction often implies "re-running" (e.g., ...
    (microsoft.public.sqlserver.server)