Re: Newbie help
From: Michael D. Long (michael.d.long-nospam_at_comcast.net)
Date: 09/22/04
- Next message: Tim: "Re: Record count"
- Previous message: Adam Machanic: "Re: UPDATE ISSUE"
- In reply to: Steve: "Re: Newbie help"
- Next in thread: Joe Celko: "Re: Newbie help"
- Reply: Joe Celko: "Re: Newbie help"
- Reply: Steve: "Re: Newbie help"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 21 Sep 2004 21:58:30 -0400
This is ultimately a case where you are getting recommendations for a hammer
to turn a screw. Do a little research on time series data archival
products, such as OSI PI and the Intellution Data Historian. Industrial
data collection poses the problem of storing massive amounts of time-series
data (just like your application) that must be analyzed in order to be
useful. The collected data must be stored efficiently in order to conserve
resources and keep manufacturing costs low. A traditional general purpose
DBMS is not the optimal solution to the problem, though such products can be
used.
To efficiently store and retrieve data for your needs, the approach used by
time series repositories is best. You only store changed values (as you
originall proposed), updating the timestamp of the relevant entry when the
data remains unchanged between intervals. This minimizes seek times for
retrieval, but does pose a problem that must be addressed during data
retrieval. When you request a value for a specific time, it is just as
likely not to be found as it is to exist. In the case where the entry
doesn't exist, you must interpolate the value. The correct value for your
application would be the record having the appropriate symbol with the next
earliest timestamp.
You can achieve much better performance for such applications with a custom
engine than a commercial DBMS, but you need a solid background in data
storage and retrieval techniques. Based on experience in this area, unique
thought patterns also help a lot.
-- Michael D. Long "Steve" <sedmyer@indy.rr.com> wrote in message news:f_K3d.76291$787.48515@fe2.columbus.rr.com... > Well here is a little bit more back ground on why I do what I do :) > > I am writing a program to gather "real-time" market data on 3000+ symbols. > I quoted the word real-time because in this case real-time means 1 tick > every 15 seconds. For each of these ticks I pull thirty different values. > Once I have gathered a sufficient amount of data (I would like to keep a > six month rolling window of data) I hope to be able to find data that act > as indicators of activity. The next step is obvious, use those indicators > to guide future trading. > > I originally thought I would simply store the data in text files to > conserve space. Further to prevent the file from being to large (and since > I am only interested in trends lasting a day or less) I thought each day I > would create a new file. However, it turns out that a single day results > in +/- 4 million rows of data which is appx. 1 gig on disk. Once I > realized the size of the data I was gathering, and the difficulty of > locating data in that file, it became obvious that I would not be able to > quickly do any type of analysis on a flat file of this size. For that > reason I decided to store the data in a SQL Server DB. Further I decided > that since I am only looking for short term trends, I would continue to > break the data into daily tables inorder to minimize query times. So the > programs first task in the morning (on trading days) is to create a table > named for the day (MM-DD-YYYY) with one column for each data element being > gathered. Then as the day goes on all tick data is simply inserted into > the days table in the DB. I was still hoping to minimize the space > required by this data. I have heard (but I do not know if it is true or > not) that in SQL Server a column of type varchar only takes up as much > space as is actually used. Since much of the data I gather does not > change as rapidly as I pull it I felt I could save a considerable amount > of space by not saving any data which has not changed and simply leave > unchanged columns null for that tick. So if I have duplicate data from 1 > tick to the next I would save to the DB only those columns which actually > changed. This presented me with a problem though, since most (if not all) > of the records are incomplete (due to the saving of null values in place > of duplicate data) how do I get a complete record for any given time to > use in the analysis. It is for this purpose that I created the procedure > that I posted. > > This DB has no other tables (at this time) than these daily data tables. > There are no relationships in this DB and there are no users (other than > myself and my application). Therefore data integrity and worries about > what users might be able to do to the DB are complete non-issues. > > "Joe Celko" <jcelko212@earthlink.net> wrote in message > news:eqnUp81nEHA.3520@TK2MSFTNGP15.phx.gbl... >>>> I was hoping that I might get some feedback on it. << >> >> Always a bad thing to ask for in this newsgroup :) >> >>>> The purpose of the procedure is to populate a single record [sic] >> with the most recent (as compared to the time parameter) non-null values >> for each column in the specified table. << >> >> What you are doing is writing metadata tools in SQL and you are not >> supposed to do that in an application. >> >> The whole idea of this procedure is a violation of basic software >> engineering principles. This is **much** more fundamental than SQL >> programming. Don't you remember all that stuff about cohesion and >> coupling? >> >> Did you mean "as determined by the time parameter"? I do not see such a >> parameter; if it were done properly it would be a temporal data type and >> not a string. Dynamic SQL is a sign of failure on the part of the DB >> programmer, so he has to let the end user build the system he was not >> able to. >> >> The right answer is never pass a table name as a parameter. You need to >> understand the basic idea of a data model and what a table means in >> implementing a data model. Go back to basics. What is a table? A model >> of a set of entities or relationships. EACH TABLE SHOULD BE A DIFFERENT >> KIND OF ENTITY. What having a generic procedure works equally on >> automobiles, octopi or Britney Spear's discology is saying that your >> applications a disaster of design. >> >> 1) This is dangerous because some user can insert pretty much whatever >> they wish -- consider the string 'Foobar; DELETE FROM Foobar; SELECT * >> FROM Floob' in your statement string. >> >> 2) It says that you have no idea what you are doing, so you are giving >> control of the application to any user, present or future. Remember the >> basics of Software Engineering? Modules need weak coupling and strong >> cohesion, etc. >> >> 3) If you have tables with the same structure which represent the same >> kind of entities, then your schema is not orthogonal. Look up what >> Chris Date has to say about this design flaw. >> >> 4) You might have failed to tell the difference between data and >> meta-data. The SQL engine has routines for that stuff and applications >> do not work at that level, if you want to have any data integrity. >> >> Yes, you can write a program with dynamic SQL to kludge something like >> this. it will last about a year in production and then your data >> integrity is shot. >> >> You use square brackets, global temporary table, TOP and other >> proprietary syntax edven when standard syntax is allowed. Your names >> violate ISO-11179 naming rules with those silly 1960's BASIC prefixes. >> >> --CELKO-- >> Please post DDL, so that people do not have to guess what the keys, >> constraints, Declarative Referential Integrity, datatypes, etc. in your >> schema are. Sample data is also a good idea, along with clear >> specifications. >> >> >> *** Sent via Developersdex http://www.developersdex.com *** >> Don't just participate in USENET...get rewarded for it! > >
- Next message: Tim: "Re: Record count"
- Previous message: Adam Machanic: "Re: UPDATE ISSUE"
- In reply to: Steve: "Re: Newbie help"
- Next in thread: Joe Celko: "Re: Newbie help"
- Reply: Joe Celko: "Re: Newbie help"
- Reply: Steve: "Re: Newbie help"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|