.NET 2.0 performance bug in ArrayList.Sort



I have come across with what appears to be a significant performance bug in
..NET 2.0 ArrayList.Sort method when compared with Array.Sort on the same
data. Same data on the same CPU gets sorted a lot faster with both methods
using .NET 1.1, that's why I am pretty sure its a (rather serious) bug. Below
you can find C# test case that should allow you to reproduce this error, to
run it you will need to put 2 data files into current directory where
executable is or just change filename pathes accordingly, the data files can
be obtained from here:

fast_data.txt: http://majestic12.co.uk/files/other/dotnet2bug/fast_data.txt
slow_data.txt: http://majestic12.co.uk/files/other/dotnet2bug/slow_data.txt

The data are strings (URLs) of about similar size in bytes and number.

The following are the console outputs from code on the same machine (AMD x2
3800, 2 GB RAM, XP Pro SP2):

VS2003 .NET 1.1 (with SP1) run:

-----------------------------------------------------------
Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 234 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 250 msec
-----------------------------------------------------------

Note that sorting times are almost exactly the same here, so all good in
..NET 1.1 .

-----------------------------------------------------------
VS2005 .NET 2.0 run:

Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 1531 msec
Time to sort strings in string[] array is: 187 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 703 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------------------

Notice that on the same machine with the same data sorting times are MUCH
slower in ArrayList.Sort, and particularly for the "slow_data.txt" file,
Array.Sort times are actually better, so I am not complaining there, but
clearly ArrayList sorts are seriously flawed - this appears to be data
dependent and by that I don't mean size of the data or number of strings: I
have got lots of such data files and about every 10th of them is 10-20 times
slower than the other even though it has got about the same number of lines
and bytesize.

Note: I am aware of boxing/unboxing overheads when dealing with ArrayLists,
however in this case the slowdown is really bad comparing to .NET 1.1 and it
appears to be data dependent - I am getting it on about 10% of my dataset
from which I have selected 2 examples (slow and fast) to demonstrate that its
a very serious performance issue.

Here is the code that should allow you to replicate the problem:

//////////////////////////////////////////////////////////////////////////
using System;
using System.Collections;
using System.IO;

/*
This is a test case of what appears to be a major performance issue in
ArrayList.Sort method for strings
in .NET 2.0 - it appears to be data dependant as some similarly sized
data files have got a lot less
performance penalty when sorting them using Array.Sort method on
string[] array.

The kicker: both versions run fast in .NET 1.1

Author: Alex Chudnovsky <alexc@xxxxxxxxxxxxxxxx>
Date: 28 Apr 2006
*/


namespace Majestic12
{
/// <summary>
/// Tests sorting performance of Array.Sort of string[] versus
ArrayList.Sort of the same strings
/// It appears that in .NET 2.0 in some cases ArrayList will take a LOT
more time to do the sorting
/// </summary>
class SlowArrayListSortTest
{
static void Main(string[] args)
{
// load strings from file: assumed to be in the same place as
the executable
TestFile("slow_data.txt"); // <--- this data file has got 10
times slower
TestFile("fast_data.txt"); // <--- more reasonable 2 times slower

Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void TestFile(string sFile)
{
FileStream oFS=File.OpenRead(sFile);

ArrayList oLines=new ArrayList();

StreamReader oSR=new StreamReader(oFS);

while(oSR.BaseStream.Position<oSR.BaseStream.Length)
{
oLines.Add(oSR.ReadLine());
}

oFS.Close();

Console.WriteLine("Loaded {0} lines from file
{1}",oLines.Count,sFile);

// now copy same strings into string array for speed comparisons
string[] sLines=new string[oLines.Count];

for(int i=0;i<sLines.Length;i++)
sLines[i]=(string)oLines[i];

DateTime oTime=DateTime.Now;
oLines.Sort();
Console.WriteLine("Time to sort strings in ArrayList is: {0}
msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);

oTime=DateTime.Now;
Array.Sort(sLines);
Console.WriteLine("Time to sort strings in string[] array is:
{0} msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);
}
}
}

//////////////////////////////////////////////////////////////////////////


.



Relevant Pages

  • Re: .NET 2.0 performance bug in ArrayList.Sort
    ... The data are strings of about similar size in bytes and number. ... Time to sort strings in stringarray is: ... ArrayLists, however in this case the slowdown is really bad comparing ...
    (microsoft.public.dotnet.general)
  • Re: .NET 2.0 performance bug in ArrayList.Sort
    ... Dim b As New ArrayList ... The data are strings of about similar size in bytes and number. ... Time to sort strings in stringarray is: ...
    (microsoft.public.dotnet.general)
  • Re: Hours
    ... > but that doesn't say if it holds strings, doubles or integers etc. ... support Generics so it will be easier to get an array of list of datetime... ... >> I would use an array of arrays or an array of ArrayLists. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: .NET 2.0 performance bug in ArrayList.Sort
    ... it with my data that I linked to, specifically the following that seems slower than usual: http://majestic12.co.uk/files/other/dotnet2bug/slow_data.txt ... Time to sort strings in Generic List is: ... they use another comparer or have rewritten the comparer to work ...
    (microsoft.public.dotnet.general)
  • Re: K&R2 Secition 5.9 - major blunders
    ... Each element of b doesn't point to a 20 element array of int. ... This mistake is crucial because ... my explanation is really the qualities of something else: ... > The use of the array of pointers is to store the strings. ...
    (comp.lang.c)