Re: Memory Management extremely poor in C# when manipulating strin
- From: "Segfahlt" <Segfahlt@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 22 Aug 2005 14:58:21 -0700
This is for Jon's & Willy's replies.
1.) No. I'm not loading the whole file into memory. I'm reading it line by
line. Originally I tried reading it in in 50M chunks. I slowly whittled it
down to 1M chunks, and still found no relief. Now, I'm reading it in line by
line.
2.) I'm not sure if I can post a "short but complete" version that
demonstrates what I'm seeing. I can post a trimmed down version, but it
encompasses some other data structures that I think are affecting the memory
management(Hashtables). I guess I could post the basic code with some stats
on the hash tables.
The entire algorith is only about 300 lines(with comments) so I'm going to
go ahead and post it here(minus some of the superflous stuff).
Please let me know if it would be more beneficial to post an actual working
program. I would think the idea here is not to identify bugs, but identify
where memory is not getting released.
-----------BEGIN CODE SNIPPET------------------
private void ProcessInputFile(FileInfo f) {
/* NOTE: the UPC object has been previously instantiated & populated
prior to this call. */
string rejectFilePath = null;
string outputFilePath = null;
int RecordsConverted = 0;
int RejectedRecords = 0;
int foundBySKU = 0;
int foundByToy = 0;
int foundByLocalSKU = 0;
int foundByLocalToy = 0;
string inLine = null;
Hashtable lSKU_UPC_HASH = new Hashtable();
Hashtable lTOY_UPC_HASH = new Hashtable();
StreamReader IF = new StreamReader(f.FullName);
//this is how we get the DATE_END value so we can name our output file
//accordingly. First line is junk. 2nd line is what we want.
IF.ReadLine();
inLine = IF.ReadLine();
inLine = Regex.Replace(inLine, @"\s+","");
string[] fieldvalues = Regex.Split(inLine, ",");
string date = fieldvalues[4];
outputFilePath = this.process_path + "\\" + date + "_output_" + f.Name;
IF.Close();
//now start in on the actual processing.
rejectFilePath = this.reject_file_path + "\\" + date + "_reject_" + f.Name;
StreamWriter OFR = new StreamWriter(rejectFilePath ,false);
StreamWriter OF = new StreamWriter(outputFilePath,false);
IF = new StreamReader(f.FullName);
//header information. Need to append "UPC".
inLine = IF.ReadLine();
OFR.WriteLine(inLine);
OFR.Flush();
inLine = Regex.Replace(inLine, @" +, +","\t");
inLine += "\tUPC";
OF.WriteLine(inLine);
OF.Flush();
string prevSKU = null;
string prevToy = null;
string curSKU = null;
string curToy = null;
string sUPC = null;
while((inLine = IF.ReadLine()) != null) {
RecordsConverted++;
StringBuilder buf = new StringBuilder(1024);
string[] fields = Regex.Split(Regex.Replace(inLine,@" +",""), @",");
//split
curSKU = fields[2];
curToy = fields[6];
/*
* The following bit of code is a somewhat vain attempt at some
performance
* improvements for speed. What we have here is a lookup against two
hashes.
* The first hash, the SKU hash, is only about
* 2200 items long. The Toy hash, though, is over 100,000. The files are
organized
* mostly sorted by SKU. So first, I'll store our SKU and TOY # value in
temp variables.
* Look up the values for them, then continue on to the next loop. On
the next loop, if my
* SKU or TOY are the same, we know we'll get the same UPC from it, so
we'll just use the
* stored UPC from the last loop. If we find a new Toy or SKU, then
we'll look up that new value
* and store it the two dynamically built hashes lSKU_UPC_HASH or
lTOY_UPC_HASH. These guys will be
* much smaller than the full hashes for the same type. Since they are
smaller they'll be faster.
* Finally, if we can't find our UPC based on previous value or values
which we've looked up in
* our smaller local hashes, we'll go to the global hashes to find our
UPC. Once we get it, we'll
* store the TOY and SKU in our local Hashes for use next time.
*/
if(! (curSKU == prevSKU || curToy == prevToy)) {
if(lSKU_UPC_HASH.ContainsKey(curSKU)) {
foundByLocalSKU++;
sUPC = lSKU_UPC_HASH[curSKU].ToString();
} else if(lTOY_UPC_HASH.ContainsKey(curToy)) {
sUPC = lTOY_UPC_HASH[curToy].ToString();
foundByLocalToy++;
} else {
//the SKU's have a . behind them in the text file, so we need
//to strip it out
string sSKU = Regex.Replace(curSKU,@"\.","");
sUPC = UPC.GetUPCBySKUShared(sSKU); //UPC.GetUPCBySKUShared(sSKU) is
just a Hash Lookup by sSKU
if(sUPC != null) {
lSKU_UPC_HASH.Add(curSKU, sUPC);
foundBySKU++;
} else {
string sToy = curToy.Length == 4 ? " " + curToy : curToy;
sUPC = UPC.GetUPCShared(sToy); //UPC.GetUPCShared(sToy) is just a
Hash Lookup by sToy
if(sUPC != null) {
lTOY_UPC_HASH.Add(curToy,sUPC);
foundByToy++;
}
}
}
prevSKU = curSKU;
prevToy = curToy;
//if we can't find a UPC, we need to reject the record. Do this by
writing the record
//to the reject file, bump up our reject record counter and continue.
if(sUPC == null ||sUPC.Length < 1) {
RejectedRecords++;
OFR.WriteLine(inLine);
continue;
}
} // if(! (curSKU == prevSKU || curToy == prevToy)) {
OF.WriteLine(string.Join("\t", fields) + "\t" + sUPC);
} //While IF.ReadLine
OF.Close();
OFR.Close();
IF.Close();
}
-------------END CODE SNIPPET---------------
"Jon Skeet [C# MVP]" wrote:
> Segfahlt <Segfahlt@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > I have a fairly simple C# program that just needs to open up a fixed width
> > file, convert each record to tab delimited and append a field to the end of
> > it.
> >
> > The input files are between 300M and 600M. I've tried every memory
> > conservation trick I know in my conversion program, and a bunch I picked up
> > from reading some of the MSDN C# blogs, but still my program ends up using
> > hundreds and hundreds of megs of ram. It is also taking excessively long to
> > process the files. (between 10 and 25 minutes). Also, with each successive
> > file I process in the same program, performance goes way down, so that by the
> > 3rd file, the program comes to a complete halt and never completes.
> >
> > I ended up rewriting the process in perl which takes only a couple minutes
> > and never really gets above a 40 M footprint.
> >
> > What gives?
>
> It's very hard to say without seeing any of your code. It sounds like
> you don't actually need to load the whole file into memory at any time,
> so the memory usage should be relatively small (aside from the overhead
> for the framework itself).
>
> > I'm noticing this very poor memory handling in all my programs that need to
> > do any kind of intensive string processing.
> >
> > I have a 2nd program that just implements the LZW decompression
> > algorithm(pretty much copied straight out of the manuals.) It works great on
> > files less than 100K, but if I try to run it on a file that's just 4.5M
> > compressed, it runs up to 200+ Megs footprint and then starts throwing Out of
> > Memory exceptions.
> >
> > I was wondering if somebody could look at what I've got down and see if I'm
> > missing something important? I'm an old school C programmer, so I may be
> > doing something that is bad.
> >
> > Would appreciate any help anybody can give.
>
> Could you post a short but complete program which demonstrates the
> problem?
>
> See http://www.pobox.com/~skeet/csharp/complete.html for details of
> what I mean by that.
>
> --
> Jon Skeet - <skeet@xxxxxxxxx>
> http://www.pobox.com/~skeet
> If replying to the group, please do not mail me too
>
.
- Follow-Ups:
- Re: Memory Management extremely poor in C# when manipulating strin
- From: Jon Skeet [C# MVP]
- Re: Memory Management extremely poor in C# when manipulating strin
- From: Helge Jensen
- Re: Memory Management extremely poor in C# when manipulating strin
- References:
- Memory Management extremely poor in C# when manipulating string..
- From: Segfahlt
- Re: Memory Management extremely poor in C# when manipulating string..
- From: Jon Skeet [C# MVP]
- Memory Management extremely poor in C# when manipulating string..
- Prev by Date: C# - Attributes - Unit Tests Question
- Next by Date: Re: Performance on string destruction
- Previous by thread: Re: Memory Management extremely poor in C# when manipulating string..
- Next by thread: Re: Memory Management extremely poor in C# when manipulating strin
- Index(es):
Relevant Pages
|