Re: Reading & Searching a Huge file
- From: "Peter Duniho" <NpOeStPeAdM@xxxxxxxxxxxxxxxx>
- Date: Fri, 14 Mar 2008 11:37:05 -0700
On Fri, 14 Mar 2008 10:35:49 -0700, Ahmad Jalil Qarshi <ahmaddearNO@xxxxxxxxxxxxxxx> wrote:
[...]
Now what I want to is to find a specific numeric value at the beginning of
the line. For example in the above few lines the numeric values are
52,96,100,150,153,159,168,210,569,1000,1010 (these values are always
sorted). Suppose I have to search 168. Since the values are sorted I can use
the binary search technique to search a specific numeric value. But for that
I will have to read all the numeric values sequentially into some array in
memory. The other way is sequential search i.e to read line by line and
fetch the numeric value before [space] and compare it with required one.
What is the best possible way to perform searching in the above mentioned
file format.
How many times do you need to perform this search?
If it's just once, then I think you could take advantage of the fact that the lines are all _almost_ the same length and do a binary search on the file itself. You'll have to make a guess, then search for the previous/next end-of-line, then check the numeric value at the beginning of that line. That's not as efficient as it could be if you had identical line lengths, but compared to scanning through a 2GB file it should still be plenty fast.
If you're going to perform the search on a regular basis, I'd recommend creating an index for the file. Scan it once, storing just the initial numeric value and a byte offset within the file representing where that record's line starts. Then later you can do your binary search on the index instead of the file. This is, I think, what you're suggesting when you wrote "read all the numeric values sequentially into some array"; you didn't mention also storing the file offsets for each record, but hopefully you realize that's implied since otherwise keeping the values in an array wouldn't be useful.
Of course, keep in mind that this is the sort of thing that databases are really good at. If you have a way to redirect the data into a database rather than keeping it in this huge text file, I think that would be the best way to handle it.
Pete
.
- Follow-Ups:
- Re: Reading & Searching a Huge file
- From: Ahmad Jalil Qarshi
- Re: Reading & Searching a Huge file
- References:
- Reading & Searching a Huge file
- From: Ahmad Jalil Qarshi
- Reading & Searching a Huge file
- Prev by Date: About to give up :( please help
- Next by Date: Re: What is the least impactful way (to the end user) to deploy a c#/database application?
- Previous by thread: Reading & Searching a Huge file
- Next by thread: Re: Reading & Searching a Huge file
- Index(es):
Relevant Pages
|