Recently I had to process a type of flat file with a rather tricky format that looked something like this (simplified for clarity):
H,INV_201008.CSV,2010-09-01
G,1011,CODE_001
D,23562,100.00,32,2010-05-05
D,73522,110.00,33,2010-05-06
D,82643,104.00,11,2010-05-07
G,1012,CODE_002
D,23553,99.00,32,2010-05-05
D,73463,156.60,11,2010-05-06
D,96473,122.10,88,2010-05-07
G,1013,CODE_003
D,25435,30.00,14,2010-05-05
D,13435,80.00,8,2010-05-06
D,97563,42.00,22,2010-05-07
A data file consists of one header (H) row, multiple group (G) rows and multiple data (D) rows for each G row.
After using BizTalk's flat file schema wizard, I had to make some manual modifications to the schema to accommodate the unbounded repeating set of G and D rows by treating G as a record,
D as a repeating record and then wrapping the G and D elements within an <xs:sequence maxOccurs="unbounded"> tag.
BizTalk will generate an xml representation of the flat file that looks like this:
<Header>
<FileName>INV_201008.CSV</FileName>
<CreationDate>2010-09-01</CreationDate>
</Header>
<GroupRow>
<Id>1011</Id><Code>CODE_001</Code>
</GroupRow>
<DataRow>
<Id>23562</Id><Price>100.00</Price><Qty>32</Qty><Date>2010-05-05</Date>
</DataRow>
<DataRow>
<Id>73522</Id><Price>110.00</Price><Qty>33</Qty><Date>2010-05-06</Date>
</DataRow>
<DataRow>
<Id>82643</Id><Price>104.00</Price><Qty>11</Qty><Date>2010-05-07</Date>
</DataRow>
<GroupRow>
<Id>1012</Id><Code>CODE_002</Code>
</GroupRow>
<DataRow>
<Id>23553</Id><Price>99.00</Price><Qty>32</Qty><Date>2010-05-05</Date>
</DataRow>
<DataRow>
<Id>73463</Id><Price>156.60</Price><Qty>11</Qty><Date>2010-05-06</Date>
</DataRow>
<DataRow>
<Id>96473</Id><Price>122.10</Price><Qty>88</Qty><Date>2010-05-07</Date>
</DataRow>
<GroupRow>
<Id>1013</Id><Code>CODE_003</Code>
</GroupRow>
<DataRow>
<Id>25435</Id><Price>30.00</Price><Qty>14</Qty><Date>2010-05-05</Date>
</DataRow>
<DataRow>
<Id>13435</Id><Price>80.00</Price><Qty>8</Qty><Date>2010-05-06</Date>
</DataRow>
<DataRow>
<Id>97563</Id><Price>42.00</Price><Qty>22</Qty><Date>2010-05-07</Date>
</DataRow>
As can be seen, when BizTalk disassembles the flat file, the xml data is in a flat structure. The challenge is to transform it to a hierarchical structure like this:
<Header>
<FileName>INV_201008.CSV</FileName>
<CreationDate>2010-09-01</CreationDate>
</Header>
<Group>
<Id>1011</Id><Code>CODE_001</Code>
<Data>
<Id>23562</Id><Price>100.00</Price><Qty>32</Qty><Date>2010-05-05</Date>
</Data>
<Data>
<Id>73522</Id><Price>110.00</Price><Qty>33</Qty><Date>2010-05-06</Date>
</Data>
<Data>
<Id>82643</Id><Price>104.00</Price><Qty>11</Qty><Date>2010-05-07</Date>
</Data>
</Group>
<Group>
<Id>1012</Id><Code>CODE_002</Code>
<Data>
<Id>23553</Id><Price>99.00</Price><Qty>32</Qty><Date>2010-05-05</Date>
</Data>
<Data>
<Id>73463</Id><Price>156.60</Price><Qty>11</Qty><Date>2010-05-06</Date>
</Data>
<Data>
<Id>96473</Id><Price>122.10</Price><Qty>88</Qty><Date>2010-05-07</Date>
</Data>
</Group>
<Group>
<Id>1013</Id><Code>CODE_003</Code>
<Data>
<Id>25435</Id><Price>30.00</Price><Qty>14</Qty><Date>2010-05-05</Date>
</Data>
<Data>
<Id>13435</Id><Price>80.00</Price><Qty>8</Qty><Date>2010-05-06</Date>
</Data>
<Data>
<Id>97563</Id><Price>42.00</Price><Qty>22</Qty><Date>2010-05-07</Date>
</Data>
</Group>
I tried several approaches but couldn't seem to get BizTalk's mapper to generate what I wanted. I then tried the custom XSL route and after a lot of experimentation, I got it working.
Creating the <Group> nodes was the easy bit (using an <xsl:for-each> and selecting the <GroupRow> nodes). Creating the filter expression for <Data> nodes within each group node iteration was the hardest part. In a nutshell, the <xsl:for-each> select statement was:
following-sibling::*[local-name()='DataRow' and (count(preceding-sibling::*[local-name()='GroupRow'])) = $marker]
where $marker is a variable that I set at the start of each group node iteration like this:
<xsl:variable name="marker" select="count(preceding-sibling::*[local-name()='GroupRow'])+1" />
Hope this tip will save someone some time.