Re: string dictionary and memory issue.



Alexandre Brisebois (www.pointnetsolutions.com) wrote:
I am currently building a lexical analysis component to pull keywords
out of content,
I currently have a functional first build, but I am having problems
ince I am easily loading over 300 000 strings in memory,

when I am doing the actual analysis I can reach upto 400 mb of ram
usage.

I currently have built my dictionary out of a tree built by nodes
containing hashtables. each node represents a letter of the string and
a flag representing the end of a string.

How many entries are in each node? I wouldn't expect there'd be that
many - particularly if you're only dealing with ASCII characters. I'd
suggest using a plain array, either as a list of entries (for nodes
without many sub-entries) or an array with nulls in (for those where
traversing a list would be expensive). You could use ArrayList/List<T>
instead of the straight arrays, but call TrimToSize on all of them when
you've finished loading the list of words to avoid wastage.

Jon

.



Relevant Pages

  • Re: Range Object Misunderstanding
    ... Public gStrArray() As String ... Public Function rngSrtAs String, ... ' "List" is string array for sorting ... entries from the named range "Groups". ...
    (microsoft.public.excel.programming)
  • Re: Question about ruby syntax
    ... array of entries containing the html for every tr with a white ... So it takes the page's URI, casts it to a string and replaces the ...
    (comp.lang.ruby)
  • Re: Finding the number of occurences in an array
    ... easy way to find out how many entries in an array match a string? ... the entries in @gfnames contain the string ".txt" without going through ... You use the magic wand to transform the array into a count! ...
    (comp.lang.perl.misc)
  • Re: CString Array to LPCTSTR *
    ... of string where he has a diagram with segment descriptors and each of the ... entries in the descriptor table points to segments of string. ... "For vector the element data structure is most likely an array" ...
    (microsoft.public.vc.mfc)
  • Help in French|Spanish|German translation.
    ... I am also an author of User-defined string functions. ... WORDTRANEX (cSearched, cArExpressionSought | cExpressionSough, ... each string of the array is searched ... If the parameter nArStartOccurrence is -1 or omitted, the replacement starts ...
    (microsoft.public.fox.helpwanted)