Re: How many bytes per Italian character?



See my last post. Essentially, it is a hyrbrid solution between array and list
performance.

Another cool algorithm I did some years ago was to create sparse arrays with constant time
access (a trick called "comb vectors" developed at the University of Karlsruhe). This
was to represent parser state tables, where the dimensionality is n*m, for n the number of
parser states and m the number of terminal symbols. Tables like this can be massive; even
on modern machines, the paging burden can be a killer. But for most states, there are no
valid transitions for most input symbols. Comb vectors start to approximate O(V(n*m))
where V is the number of valid states in the array, which is typically a few hundred in an
array of nominally tens of thousands of elements.

For another performance issue, if you don't need sequentiality, there's a technique called
"perfect hashing" which asymptotically approaches O(1); the best we got was O(1.4) on
large input data sets. But for the reserved-word lookup, we could precompute the hashing
parameters and get O(1.05).
joe
On Mon, 9 Apr 2007 11:17:25 +0900, "Norman Diamond" <ndiamond@xxxxxxxxxxxxxxxx> wrote:

"MrAsm" <mrasm@xxxxxxx> wrote in message
news:03uf13p5v1grlv20e11119bbj38c6blbcg@xxxxxxxxxx
On Sat, 07 Apr 2007 13:32:45 -0400, Joseph M. Newcomer
<newcomer@xxxxxxxxxxxx> wrote:
But someone at MS told me that her standard interview question is "here is
a list in sorted order. Write the code to insert a new element in sorted
order" and of 27 interviewees, 26 failed to do it, and the only one who
got it right was a PhD candidate, and she struggled to get it. I can do
it with both LRU optimization for lookup and sublinear insert performance
for large lists without thinking too hard about it.

With "list", do you mean "linked list"?
If with list we mean array, I could do a kind of binary search to find the
place of insertion, and then insert. This would be a O(log(n)) algorithm,
am I correct?

With a sorted array, it would be O(log(n)) to find the place where you're
going to do an O(n) operation. Even if you do some caching, you'll still
have to do an average of n/2 moves (i.e. O(n)) to keep it an array.

But if I have a linked list, accessing a list node with an integer key
with O(1) performance is not possible, so I think that, to visit K-th
element, I should be forced to visit all the previous (K-1) elements, am I
correct?

If Dr. Newcomer's caching succeeds then he doesn't have to visit all
previous K-1 elements. But I still don't quite see how it works. If the
input is coming in randomly then on average you'll have to still have to
examine K/4 elements (i.e. O(n)) past the place that was cached.

Of course in real life situations and with enough funding, you can figure
out what non-randomness occurs in the input and then you can cache stuff to
get the cost down.
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: garbage collection problem in large linked lists
    ... Instead all buckets are allocated at once in an array. ... VG.net, which must be very scalable, but only for lists which would normally ... Linked lists require pointer dereferencing in order to traverse. ... When you run out of space, you allocate a new smaller array. ...
    (microsoft.public.dotnet.framework.performance)
  • Re: Observer pattern limitations
    ... The components are placed in an array of lists. ... So to run the algorithm, you iterate through the ...
    (comp.object)
  • Re: Dict sharing vs. duplication
    ... array to enforce unique keys and I use lists to enforce order. ... API in a page or two of Tcl code, it isn't so much a missing feature ... OpenACS database API which creates a new specialized data structure, ...
    (comp.lang.tcl)
  • Re: Observer pattern limitations
    ... In the data-flow language I designed, ... The components are placed in an array of lists. ... So to run the algorithm, ...
    (comp.object)
  • Re: Translating python to scheme
    ... Scheme has mutation, and there are reasons to use it. ... confusing when considering a 2d array. ... in other languages: Lisp Lists and Vectors. ...
    (comp.lang.scheme)