Re: Some questions about sparse files
- From: nickdu <nicknospamdu@xxxxxxxxxxxxxxxx>
- Date: Mon, 29 Dec 2008 20:11:00 -0800
Another part of my application is not doing the work either. The client to
my application will have to do the work of storing the cookie I give back to
them. It's similar to a DB I guess. The DB just stores the data. If you
want to gain access to it you need to construct the correct SQL to get to the
data that you stored. In my case they ask me to store a blob. I do so and
return them a cookie which they need to use when they want to retrieve the
blob they stored.
In addition I have the ability to enumerate all the blobs in the store.
I won't need to have a blob to block map. When someone ask for blob at
index 5 I seek to location 5 * 1GB, read the length of the blob, read the
blob and return it to them. That's it.
The goal was to reduce the complexity of my code. The OS already gives me
this feature for free so why not use it. The Change Journal seems to have
thought it was a good feature. I have the same sort of requirement (I
think). I have a variable blob size and I would like to not waste a lot of
disk space. So I set an arbitrary large size and let the OS handle figuring
out what space to consume.
Of course I want to verify the OS implementation was good and that my
requirements are such that using sparse files might make sense which is why
I'm asking the question here.
--
Thanks,
Nick
nicknospamdu@xxxxxxxxxxxxxxxx
remove "nospam" change community. to msn.com
"m" wrote:
So your solution is to have another part of the app do the work?.
Do these references persist across restarts? If so, then how do you expect
them to be stored? As part of a larger already existing entity?
It was not the specifics of your psudocode that I was responding to but the
nature of if and the obvious indications that you are not considering the
full scope of the problems you will need to solve for this solution to be
fully functional and reliable.
However, as it is obvious that I can't dissuade you from your purpose, I
have endeavoured to point out some of the problems that you will face
The main advantage of this design is eliminating the need to call CreateFile
when a new blob is to be written or must be retrieved by having a single
data file always open. Depending on your data rate, this overhead is likely
insignificant; but if your data rate is really high, then you may need to
follow this approach. Of course the primary reason to use a sparse file is
to simplify the code, at the expense of IO performance, and this would seem
counterproductive in this case. Also, the code complexity increased by
keeping a blob to block map versus a static offset is not significant with
respect to the complexity required to prevent file corruption during HW / OS
failures.
And hence my original question: what is it about your program that makes it
possible to do this better than the filesystem?
"nickdu" <nicknospamdu@xxxxxxxxxxxxxxxx> wrote in message
news:3E3FF996-50D7-4179-96EA-92140E6D0789@xxxxxxxxxxxxxxxx
The indiex is the cookie I returned from my Store() method. The consumer
was
given the cookie and they give it back to me when they want to retrieve
the
blob.
As it was pseudo code I provided I was not too concerned with performance.
--
Thanks,
Nick
nicknospamdu@xxxxxxxxxxxxxxxx
remove "nospam" change community. to msn.com
"m" wrote:
Okay - how would you get the index? This is what you need to be able to
do
better than the file system does for this plan to make sense. And you
need
to be able to say that the benefit of using your sparse file is
sufficient
to outweigh the extra problems with backup & maintenance.
BTW: this psudocode is inefficient and can be improved by using page
sized
aligned reads. The seek is unnecessary and will preclude multiple
readers -
use overlapped IO. Both reads can be consolidated for small (less than 1
buffer sized) blobs and for large blobs, multiple reads will be required.
All of this is true regardless of how the file(s) that store the data are
arranged
Also, storing metadata creates the possibility of file corruption -
consider
the case of process termination during a blob insert. If the metadata is
updated, and the blob data isn't, there will be garbage returned when the
app asks for that blob; but if the metadata isn't updated, the insert
will
be lost completely; and, even worse, if a partial metadata update is
made,
then the whole file might be unreadable. These risks can be mitigated by
using FILE_FLAG_NO_BUFFERING + FILE_FLAG_WRITE_THROUGH, but you must
write
code that can handle the consequences of corruption and / or to check for
these conditions and try to correct them. This is something else that
the
file system does for you directly.
"nickdu" <nicknospamdu@xxxxxxxxxxxxxxxx> wrote in message
news:A3F5E9E5-3FC4-49A7-8F34-666735135A77@xxxxxxxxxxxxxxxx
Hmmm, I replied to this yet it didn't seem to make it to this thread.
The indexing would look something similar to:
GetBlob(int index, byte *bytes, int size)
{
Seek(_sparse, (index + 1) * 1GB, SEEKOFFSET_ORIGIN);
Read(_sparse, &length, sizeof(length));
Read(_sparse, bytes, min(length, size));
}
Most likely I would use the first block to store information, like the
next
free block. Other than that the indexing would be as showin above.
Just
like indexing an array. That's the beauty of sparse files, right? You
can
be wasteful and pick a huge block size and the OS will only consume the
actual amount of space you use. Of course the OS must have to do some
book
keeping, but better the OS than me.
The NTFS change journal uses sparse file so I assume it saw a benefit
in
using it.
--
Thanks,
Nick
nicknospamdu@xxxxxxxxxxxxxxxx
remove "nospam" change community. to msn.com
"m" wrote:
What technique do you plan to use for indexing your blobs that will be
better than a file system? This is, after all, what they are designed
to
do.
"nickdu" <nicknospamdu@xxxxxxxxxxxxxxxx> wrote in message
news:ABA4112C-311F-48C1-A39E-9A62F4648620@xxxxxxxxxxxxxxxx
I have a couple questions regarding sparse files.
1. We're looking for an easy and efficient way to store blobs (array
of
bytes) of data in some sort of circular queue and return some sort
of
key
someone can use for later access to that blob. The blobs can very
varying
lengths. I was wondering whether sparse files would be a reasonable
approach
for this as it appears it's an efficient mechanism for storing
messages
of
varying length (at least that's what I gather).
I guess you could also store each blob in its own file, but then I
think
the
overhead of creating a file per blob (may have millions) might be
costly.
I could also store all the blobs in a single "normal" (non-sparse)
file,
but
then I think the house keeping of walking the chain of blobs (most
likely
I'll need to chain them if I don't use a sparse file) might be
costly
in
terms of performance and also adds to the code I'll have to write.
Though
I
guess NTFS is keeping its own list of sections of the file that
contain
data.
2. If I want to copy a sparse file to other Windows machines running
NTFS
do
I need to write my own code to do that or does CopyFile() handle
sparse
files
such that it only copies the parts of the file which contain data
such
that
the copied file is exactly the same as the source file?
--
Thanks,
Nick
nicknospamdu@xxxxxxxxxxxxxxxx
remove "nospam" change community. to msn.com
- Follow-Ups:
- References:
- Some questions about sparse files
- From: nickdu
- Re: Some questions about sparse files
- From: m
- Re: Some questions about sparse files
- From: nickdu
- Re: Some questions about sparse files
- From: m
- Re: Some questions about sparse files
- From: nickdu
- Re: Some questions about sparse files
- From: m
- Some questions about sparse files
- Prev by Date: RE: retrieve vendor id from a removable drive letter
- Next by Date: Re: Some questions about sparse files
- Previous by thread: Re: Some questions about sparse files
- Next by thread: Re: Some questions about sparse files
- Index(es):
Relevant Pages
|