Re: SSIS: ET flat text file to multi-level hierachal XML.



It works! The Script Component saves the day! :) I'm glad it's fast too.

"bchi49@xxxxxxxxx" wrote:

I am testing another solution. I am using the script component more
intensively for flat file parsing and extraction and have multiple
output columns (derived from the script component). It seems to be
working well.

Before I wanted to use the derived components, split components, etc so
it would be easier to view the visual design. But the script is not so
bad either since it works like the way I wanted! :)

Since, the data is so denormalized, after the script component, I might
use the unpivot component to normalize it for loading.

I'll keep this thread posted as more information comes.

Benjamin wrote:
The reason I had to drop into temp tables was the structure of the formatted
report flat file. The report has multiple headers, multiple details per
header, multiple types of details having different detail columns, and some
detail has sub-detail rows etc. So to parse out the data and join them
together I had to first, add line numbers to the flat file using script
component; the the line number is used for relating/join header to detail and
sub-details. Each line or lines (header, detail, sub-details) I use split
component to split the headers and attach to derive column component to
extract the columns for each type.

SSIS seems to execute tasks completely in sequence. E.g. if there are 100
lines, it'll pass all lines to the header extraction component first to get
all headers. Then it'll go to the next step to extract all detail lines. I
was hoping it wouldn't do that so I can join them more easily; e.g. when it
finishes the first header it'll get it's detail line, then join them. Then
goto the next header and detail line and join them again. But that wasn't the
case. Is it possible to configure SSIS to do that in sequence for line by
line is there something like (asynchronize/synchronize process)?

So, for now it was extracting all sections, then I dump them to temp tables.
Then the next flow was to join them together using lookup components. e.g.
for each detail, lookup the header adjacent to this detail line; e.g. header
line is 5 and detail line is 8 then header line 5 is the closes header, so it
means header line 5 is related to detail line 8. In the lookup I had to use
SQL query to sort and return only the closes header.

That was the reason I had to use temp table; to sort and join the header and
details. SSIS doesn't provide that type of sorting and joining capability.
The Merge Join component only join the keys that are identical (not the
relative join). And I couldn't join them in SSIS could I? maybe union them
together and then use script component? hmm....

If the provider can provide us the data only flat files, then it'll be much
easier. Then we probably won't even need XML. XML has a benefit where it's
passed to stored procedure (SP) and SP would do all the ETL tasks. It would
improve performance, and most likely faster than SSIS.

Any suggestion or recommandation is appreciated. Thanks :)

Thanks,
Benjamin

"Allan Mitchell" wrote:

Hello Benjamin,

I don't know whether MS intend to release an XML destination adapter. As
for your solution.

Why did you need to drop into temp tables?

You are right though in that it may be easier for you to ask the feed providers
to drop to you multiple csv files, load them into the pipeline, use a merge
join perhaps then eventually use a Script component as a destination and
craft the XML yourself.


Regards

Allan Mitchell
Konesans Ltd
T +44 7966 476 572
F +44 2071 008 479
http://www.konesans.com

SSIS seems to lack a powerful feature that Informatica PowerCenter and
GXS AI software provides. That is the ability to extract and group
data in a semi-structured flat text file to XML with multi-level
hierachal XML intutively without hand coding.

With SSIS, I had to extract the formatted report flat text files in to
SQL Server temp tables, then use lookups and OLE DB SQL commands to
update the foreign key relationships across temp tables. Then
manipulate the temp tables and load them to the working tables.

That is a lot of work, and the performance is bad for this type of
work.

Using other products such PowerCenter, it can transform flat text file
into XML. Then pass the XML to a stored procedure for additional
extraction and process into multiple tables. The performance wise
seems better.

One solution now is to have the provider of flat text file provide us
the unformatted report file in multiple grouping in CSV file, or
provide us XML files so we can easily do the ETL part.

Has anyone experience the same issues I'm having?

Will SSIS have XML Destination capability like PowerCenter in the
future?
Thanks,
Benjamin





.



Relevant Pages

  • Re: SSIS: ET flat text file to multi-level hierachal XML.
    ... intensively for flat file parsing and extraction and have multiple ... The report has multiple headers, ... header, multiple types of details having different detail columns, and some ... Then it'll go to the next step to extract all detail lines. ...
    (microsoft.public.sqlserver.dts)
  • Re: Flat File assembler with header
    ... >> header and body have not same number of field. ... >> how can i use the custom send pipeline with a flat file assembler to ... > is that the header schema has a set of promoted properties. ... Yoann ...
    (microsoft.public.biztalk.general)
  • Re: Flat File Problem
    ... Rather than modifying the flat file why don't you promote the property ... This would be much easier than modifying the input data, using a header ... schema and relying on the FFDasm to promote the property for you. ... If you anyway use a custom pipeline you can add a node with a record ID to ...
    (microsoft.public.biztalk.general)
  • RE: Issues mapping from XML to Flat file
    ... The situation is the way you describe: many header recs and many order recs ... header records is the timestamp. ... I am mapping from an xml file to a positional flat file. ... hvalue (element) ...
    (microsoft.public.biztalk.general)
  • Problem with parsing email message with extraneous MIME information
    ... component and extract message attachments. ... The email has some additional encapulated header ... This message is in MIME format. ...
    (comp.lang.python)