Re: garbage collection problem in large linked lists
From: Frank Hileman (frankhil_at_no.spamming.prodigesoftware.com)
Date: 11/28/04
- Next message: Mr. Mountain: "Re: garbage collection problem in large linked lists"
- Previous message: schneider: "Max CPU usage bugs/problems"
- In reply to: Mr. Mountain: "garbage collection problem in large linked lists"
- Next in thread: Mr. Mountain: "Re: garbage collection problem in large linked lists"
- Reply: Mr. Mountain: "Re: garbage collection problem in large linked lists"
- Messages sorted by: [ date ] [ thread ]
Date: Sun, 28 Nov 2004 06:59:43 -0800
Hello Mountain,
The hashtable, as implemented in the BCL, does not allocate an object per
bucket. Instead all buckets are allocated at once in an array. So the major
difference I can see is the number of objects allocated.
This means your linked list will occupy more memory, since each separate
node in the list has all the overhead of an object in the heap. It also
means the garbage collector has a lot more work to do.
In a system such as the CLR a linked list, implemented as a separate object
per node, is generally a poor choice as a collection. I have used them in
VG.net, which must be very scalable, but only for lists which would normally
contain between 0 and 5 items. In that circumstance, other types of
collection might have more overhead. But as soon as you get up in the number
of items, linked lists aren't useful any longer.
Linked lists require pointer dereferencing in order to traverse. This means
touching each node in memory, even if you don't need the contents of the
node. That makes linked lists slow compared to array-based lists, like
ArrayList.
Usually the first choice after a linked list is an array-based list, like
ArrayList. This is an object wrapping an array, keeping a count of items,
and doubling the array size when more capacity is needed. This will use a
minimum amount of memory, but will start to drag down once you start adding
enormous numbers of items, and the list must reallocate the array, and copy
the old items to the new. Insertion and deletion with enormous lists is also
slow.
If you need to add an enormous number of items to your list, and you cannot
predict beforehand exactly how many items that will be, in order to preset
the capacity of an array-based list, then you might consider creating your
own deque. This is similar to an ArrayList, but instead of maintaining a
single array, you maintain an array of arrays, and a count for each array.
Each smaller array represents one contiguous indexable chunk of item
entries. When you run out of space, you allocate a new smaller array. This
means you never have to copy over old items to new space as with an
ArrayList, and you can continue to grow to the limits of your RAM. To index
into the deque, you first index into the top level array by dividing the
index by the size of your smaller arrays (they are all the same size). You
then index into the smaller array using the modulo of that same size.
The deque then reduces the amount of memory overhead, while maintaining the
ability to add the maximum number of items to your collection.
There is another option, if you want to use a linked list. Make all your
"nodes" in the list be structs in an array. Essentially, you are creating a
memory pool for the list items. Instead of using an object reference
(pointer) to find the next node, use an integer that points to the index of
the next node in the array. You can either use a negative number to indicate
the end, or a special number, like 0, and subtract one to get the true
index. If you use 0 you can use an unsigned integer. In the code that
follows, I use -1 to signify the end.
private struct Node
{
public object Value;
public int Next;
}
In your list class, you would keep the array, the current count, the index
of the head of the list, and the index of the next free pool Node:
Node[] pool;
int count;
int head;
int nextFree;
Initialize all your nodes in the array so that each node points to the next
node in the array. This will be your "free pool". The last node points to
nothing (-1).
pool = new Node[capacity];
for (int i = 0; i < pool.Length - 1; ++i)
pool[i].Next = i+1;
pool[pool.Length -1].Next = -1;
count = 0;
head = -1;
nextFree = 0;
Every time you need a new Node, you allocate one from the pool:
if (nextFree == -1)
... get more Nodes...
newIndex = nextFree;
nextFree = pool[newIndex].Next;
pool[newIndex].Next = -1;
return newIndex;
To free a Node, you make it the nextFree, and point it to the previous
nextFree:
pool[freedIndex].Next = nextFree;
nextFree = freedIndex;
When you run out of Nodes, you allocate a new pool, and copy all the Nodes
to the new pool.
What is the advantage over an array-based list, like ArrayList? Insertions
and deletions are faster, since you have a true list structure. You do not
have the overhead of a object-per-Node linked list. Finally, you can
actually share the pool between separate lists, between all lists if you
wish, as long as you take care not to use from multiple threads. And if you
really want to squeeze out space for small lists, you can use byte or short
as the "Next" index, allocating a separate array to store all the "Next"
data (if you put in same struct alignment will prevent from saving space is
smaller than 32 bit).
But the scalability is not as great as the deque, because the entire pool is
allocated as a chunk. There is a variant which uses multiple pools, and two
indices to indicate "next": one a pool index (into an array of pools), and
one an index within the pool. That is sort of a combo of deque and linked
list. With this variant you can keep fixed size pools and copying is never
necessary when you run out of space.
Regards,
Frank Hileman
check out VG.net: http://www.vgdotnet.com
Animated vector graphics system
Integrated Visual Studio .NET graphics editor
"Mr. Mountain" <mtnbikn@mediaone.net> wrote in message
news:BsQpd.102141$5K2.67059@attbi_s03...
>I have a need to implement a linked list that will hold a large number of
> items. I have done a little performance testing to compare the linked list
> against the framework's standard hashtable.
>
> So far the linked list meets my objectives in all ways -- except for one
> problem that it exhibits. It has a garbage collection problem.
>
> If I insert 4 million test objects into a hashtable and run tests
> repeatedly, the garbage collector does its job as one would expect. I can
> run as many repeated tests as I wish.
>
> When I do the exact same test with the same 4 million test objects in the
> linked list, the first test is fine. The next test results in an
> out-of-memory exception because the garbage collector does not run. If I
> call gc.collect between tests, there are no out-of-memory problems for the
> linked list.
>
> Why is it necessary to call gc.collect manually when the linked list is
> used? I have never had to request a manual garbage collection in any other
> situation.
>
> What steps can I take to optimize the memory management when a large
> linked
> list needs to be used? This is NOT a matter of roots pointing to objects.
> The objects are all free to be collected after each test completes.
>
> I'm really interested in knowing if the framework can perform collection
> as
> efficiently (and automatically) for several million individually allocated
> list nodes as it can for the contiguous memory block that underlies a
> hashtable. If it can, what technique(s) do I need to use to make it
> happen?
>
> My test code looks like this (I show the hashtable example, but the linked
> list test driver is exactly the same):
>
> private void btnTest_Click(object sender, System.EventArgs e)
> {
> DateTime start = DateTime.Now;
> Hashtable ht = new Hashtable();
>
> for (int i = startIndex; i < testSize; i++)
> {
> Hashtable innerHt = new Hashtable();
> ht.Add(i, innerHt);
>
> for (int j = startIndex; j < testSize; j++)
> {
> OrganizationMember om = new
> OrganizationMember(j.ToString(), j.ToString(), "Test");
> innerHt.Add(j, om);
> }
> }
>
> for (int i = startIndex; i < testSize; i++)
> {
> Hashtable innerHt = ht[i] as Hashtable;
>
> for (int j = startIndex; j < testSize; j++)
> {
> OrganizationMember om = innerHt[j] as OrganizationMember;
> }
> }
>
> lblSystemResults.Text = (DateTime.Now -
> start).TotalMilliseconds.ToString();
> }
>
> private void btnClear_Click(object sender, System.EventArgs e)
> {
> System.GC.Collect();
> }//btnClear_Click
>
>
- Next message: Mr. Mountain: "Re: garbage collection problem in large linked lists"
- Previous message: schneider: "Max CPU usage bugs/problems"
- In reply to: Mr. Mountain: "garbage collection problem in large linked lists"
- Next in thread: Mr. Mountain: "Re: garbage collection problem in large linked lists"
- Reply: Mr. Mountain: "Re: garbage collection problem in large linked lists"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|