Re: Performance and Parallelism in SSIS
- From: Amit_S <s.amit.kumar@xxxxxxxxx>
- Date: Tue, 18 Sep 2007 07:45:36 -0000
In the system design docs, I had documented a Control table on similar
lines as suggested by you. This table will store the start and end
date timestamps for each of the 10 processes.
You're correct, it will take more than 5 mins to process million
records. This means the job may not run as expeceted in it's next
scheduled timestamp. But this isn't an issue coz I won't be losing any
newly arrived data (since the stored procs will be dependent on the
control table to reterive the data).
Thanks a lot for your ideas and time.
Amit
On Sep 17, 11:30 pm, Todd C <To...@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Amit:
Sorry that I don't have any performance tips for you on SSIS. Just know that
it does most of the data flow work in memory, and as always, disk I/O is
probably going to be your most resource intensive portion.
Now, for your package:
If you need to design so that you can play catch up on hundreds of thousands
or even millions of rows, then it will take a little extra work:
This is a classic Notification Services scenario
1)Create a table with one DateTimefield.
2)In the Package: Initiate two DateTime variables, Start and End.
3)Populate the StartTime variable with the value from the table. Populate
the EndTime variable from a Now() or GetDate() function. Then save the
EndTime value back to the table.
4) Pull all records between the two date variables.
Suppose you have 1M records spread over 8 hours. When the package kicks off,
it will pull those records up to the time it started. Any records added AFTER
the DateTime values have been populated will go on the NEXT run. With 1M rows
to process, there is a good chance that it won't finish in 5 minutes.
HTH
--
Todd C
[If this response was helpful, please indicate by clicking the appropriate
answer at the bottom]
"Amit_S" wrote:
Todd,
Its around 1000 rows in total for every five minutes. In Production
there can be a situation where the SQL job may be intentionally
disabled for 6-8 hrs. In such a case, huge data will have to be loaded
when the Job gets executed after its enabled. Please let me know if
this can't be handled by SSIS.
Also, will be great if you can share SSIS performance tips.
Currently the project is in design phase and I don't have the code for
all stored procs and tables. But I have started to work on a prototype
using some sample data, stored procs and few tables. I will go with
the approach of 10 parallel Data Flows. Will let you know how it goes.
Thanks,
Amit- Hide quoted text -
- Show quoted text -
.
- Follow-Ups:
- Re: Performance and Parallelism in SSIS
- From: jhofmeyr
- Re: Performance and Parallelism in SSIS
- References:
- Performance and Parallelism in SSIS
- From: Amit_S
- Re: Performance and Parallelism in SSIS
- From: Amit_S
- Re: Performance and Parallelism in SSIS
- From: Todd C
- Re: Performance and Parallelism in SSIS
- From: Amit_S
- Performance and Parallelism in SSIS
- Prev by Date: Error when running DTS package as job
- Next by Date: RE: SSIS use loop to set dynamic global variables
- Previous by thread: Re: Performance and Parallelism in SSIS
- Next by thread: Re: Performance and Parallelism in SSIS
- Index(es):
Relevant Pages
|