Re: How many bytes per Italian character?
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Mon, 09 Apr 2007 01:54:20 -0400
See my last post. Essentially, it is a hyrbrid solution between array and list
performance.
Another cool algorithm I did some years ago was to create sparse arrays with constant time
access (a trick called "comb vectors" developed at the University of Karlsruhe). This
was to represent parser state tables, where the dimensionality is n*m, for n the number of
parser states and m the number of terminal symbols. Tables like this can be massive; even
on modern machines, the paging burden can be a killer. But for most states, there are no
valid transitions for most input symbols. Comb vectors start to approximate O(V(n*m))
where V is the number of valid states in the array, which is typically a few hundred in an
array of nominally tens of thousands of elements.
For another performance issue, if you don't need sequentiality, there's a technique called
"perfect hashing" which asymptotically approaches O(1); the best we got was O(1.4) on
large input data sets. But for the reserved-word lookup, we could precompute the hashing
parameters and get O(1.05).
joe
On Mon, 9 Apr 2007 11:17:25 +0900, "Norman Diamond" <ndiamond@xxxxxxxxxxxxxxxx> wrote:
"MrAsm" <mrasm@xxxxxxx> wrote in messageJoseph M. Newcomer [MVP]
news:03uf13p5v1grlv20e11119bbj38c6blbcg@xxxxxxxxxx
On Sat, 07 Apr 2007 13:32:45 -0400, Joseph M. Newcomer
<newcomer@xxxxxxxxxxxx> wrote:
But someone at MS told me that her standard interview question is "here is
a list in sorted order. Write the code to insert a new element in sorted
order" and of 27 interviewees, 26 failed to do it, and the only one who
got it right was a PhD candidate, and she struggled to get it. I can do
it with both LRU optimization for lookup and sublinear insert performance
for large lists without thinking too hard about it.
With "list", do you mean "linked list"?
If with list we mean array, I could do a kind of binary search to find the
place of insertion, and then insert. This would be a O(log(n)) algorithm,
am I correct?
With a sorted array, it would be O(log(n)) to find the place where you're
going to do an O(n) operation. Even if you do some caching, you'll still
have to do an average of n/2 moves (i.e. O(n)) to keep it an array.
But if I have a linked list, accessing a list node with an integer key
with O(1) performance is not possible, so I think that, to visit K-th
element, I should be forced to visit all the previous (K-1) elements, am I
correct?
If Dr. Newcomer's caching succeeds then he doesn't have to visit all
previous K-1 elements. But I still don't quite see how it works. If the
input is coming in randomly then on average you'll have to still have to
examine K/4 elements (i.e. O(n)) past the place that was cached.
Of course in real life situations and with enough funding, you can figure
out what non-randomness occurs in the input and then you can cache stuff to
get the cost down.
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- References:
- Re: How many bytes per Italian character?
- From: Joseph M . Newcomer
- Re: How many bytes per Italian character?
- From: MrAsm
- Re: How many bytes per Italian character?
- From: Joseph M . Newcomer
- Re: How many bytes per Italian character?
- From: Alexander Grigoriev
- Re: How many bytes per Italian character?
- From: Joseph M . Newcomer
- Re: How many bytes per Italian character?
- From: Nobody
- Re: How many bytes per Italian character?
- From: Joseph M . Newcomer
- Re: How many bytes per Italian character?
- From: MrAsm
- Re: How many bytes per Italian character?
- From: Norman Diamond
- Re: How many bytes per Italian character?
- Prev by Date: Re: How many bytes per Italian character?
- Next by Date: Re: How many bytes per Italian character?
- Previous by thread: Re: How many bytes per Italian character?
- Next by thread: Re: How many bytes per Italian character?
- Index(es):
Relevant Pages
|