Re: IO function in Vb.Net slower than in Vb6.0
- From: "Stephany Young" <noone@localhost>
- Date: Thu, 31 Mar 2005 14:05:02 +1200
What was said was that the fields were not delimited.
The fact that there is a record delimiter is a given because of the use of
the ReadLine method.
Remember that the code in VB6 works and the 'ReadLine' method is a straight
conversion of the 'Line Input' statement which does, ostensibly, the same
thing.
Anyway the detectives have been at work.
Parsing a 100000 file of 422 characters per record in a line by line read
using regex on my workstation takes approx 73 seconds.
Parsing the same file in a line by line read using the Trim and Mid
functions takes approx 15 seconds.
Parsing the same file in a line by line read using the String.SubString and
String.Trim methods takes approx 17 seconds.
The VB6 equivalent takes approx 40 seconds.
As I said in my first post, it is highly likely that the '15 second'
difference was due to one of the the other methods that is executed on a per
record basis, rather than the reading and parsing of the file and these
results bear that out.
Although it has been an interesting exercise, I don't think that regex is
the way to go in this case.
"MeltingPoint" <none@xxxxxxx> wrote in message
news:68ydnYrAS_Xa1dbfRVn-3A@xxxxxxxxxxxxx
> "Stephany Young" <noone@localhost> wrote in
> news:OFvYnVYNFHA.3844@xxxxxxxxxxxxxxxxxxxx:
>
>> I'm just a little concerned that you might have missed a critical
>> point her MeltingPoint.
>>
>> Each record in the file is terminated by a LF or a CR/LF pair. This is
>> hown by the use of the ReadLine method in the original code fragment.
>>
>> A line is defined as a sequence of characters followed by a line feed
>> or a carriage return immediately followed by a line feed. The string
>> that is returned does not contain the terminating carriage return or
>> line feed. The returned value is a null reference (Nothing in Visual
>> Basic) if the end of the input stream is reached.
>>
>> If you use the ReadToEnd method then you have to identify what the
>> record delimiter is and split the input into 'records' based on that
>> delimiter before you can apply the RegEx anyway. Unless, of course the
>> reGex is preceded by a '$' to indicate start at the beginning of each
>> line.
>>
>> If you dont handle this then each record, subsequent to the first,
>> will be off by 1 or 2 characters compounding.
>>
>>
>>
>> "MeltingPoint" <none@xxxxxxx> wrote in message
>> news:FOqdnRmyx9xgqdbfRVn-ug@xxxxxxxxxxxxx
>>> "hillcountry74" <shruthibg@xxxxxxxxx> wrote in
>>> news:1112223769.480509.272010@xxxxxxxxxxxxxxxxxxxxxxxxxxxx:
>>>
>>>> Thanks a lot MP. Really appreciate your help.
>>>>
>>>> Can you please paste the regular expression for this? Can't find it
>>>> in the code.
>>>>
>>>> Also, on the headofhouse, it could be alphanumeric. And yes, some
>>>> fields would be blank.
>>>>
>>>> There could be files of size 400MB. In such a case, reading till
>>>> endoffile might not work. Instead, if it is changed to reading one
>>> line
>>>> at a time, do you think the speed will reduce?
>>>>
>>>> Thanks again.
>>>>
>>>
>>> This is the Regular Expression:
>>> Dim _exp As String = "((?<ActionCode>.{1})" & _
>>> "(?<CarrierID>.{25})" & _
>>> "(?<LastName>.{60})" & _
>>> "(?<FirstName>.{30})" & _
>>> "(?<MiddleName>.{15})" & _
>>> "(?<Addr1>.{60})" & _
>>> "(?<Addr2>.{60})" & _
>>> "(?<City>.{30})" & _
>>> "(?<State>.{2})" & _
>>> "(?<Zip>.{10})" & _
>>> "(?<BenefitOption>.{60})" & _
>>> "(?<EmployerGroup>.{15})" & _
>>> "(?<OptionEffDate>.{8})" & _
>>> "(?<HPEffDate>.{8})" & _
>>> "(?<TermDate>.{8})" & _
>>> "(?<Sex>.{1})" & _
>>> "(?<DOB>.{8})" & _
>>> "(?<SSN>.{9})" & _
>>> "(?<Phone>.{12})" & _
>>> "(?<EmployerGroupAnivDate>.{8})" & _
>>> "(?<HeadOfHouse>.{9})" & _
>>> "(?<PrimaryStatus>.{1})" & _
>>> "(?<MaritalStatus>.{1}))"
>>>
>>> A pretty good definition can be found on MSDN. Search For RegEx or
>>> Regular Expressions. :)
>>>
>>> I'll try to "simulate"(wink wink:) a 400mb file and check
>>> performance. Reading one line at a time is out of the question for
>>> this experiment, as it would require a couple of million
>>> reads(guessing), reading in 442bytes * nRecords would be better if
>>> not the best way to do it. BUT this and ReadToEnd both REQUIRE every
>>> record to be 442 bytes (or whatever it is) Off by one byte, and kiss
>>> you're records goodbye.
>>>
>>> MP
>>
>>
>>
>
> I figured it out. Either way, if it is a fixed record then the cr would
> be included in the record size. So the above would be 443*nRecords. No
> harm no foul. The point is the file can evenly be divided by the number
> of bytes in a record plus the delimiter (which is the first thing I
> asked him 10 posts ago and was told the was no delimiter).
>
> MP
.
- References:
- Re: IO function in Vb.Net slower than in Vb6.0
- From: hillcountry74
- Re: IO function in Vb.Net slower than in Vb6.0
- From: MeltingPoint
- Re: IO function in Vb.Net slower than in Vb6.0
- From: hillcountry74
- Re: IO function in Vb.Net slower than in Vb6.0
- From: MeltingPoint
- Re: IO function in Vb.Net slower than in Vb6.0
- From: Stephany Young
- Re: IO function in Vb.Net slower than in Vb6.0
- From: MeltingPoint
- Re: IO function in Vb.Net slower than in Vb6.0
- Prev by Date: Re: OutOfMemory exception loading ZIP or JPEG compressed TIFs from file
- Next by Date: Re: How can I determine WHICH exception I got in my CATCH?
- Previous by thread: Re: IO function in Vb.Net slower than in Vb6.0
- Next by thread: Re: IO function in Vb.Net slower than in Vb6.0
- Index(es):
Relevant Pages
|