Re: Some questions about sparse files

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Since you clearly don't want to think about the issues involved, and have
already irrevocably decided on your solution, I leave you to your problems.

"nickdu" <nicknospamdu@xxxxxxxxxxxxxxxx> wrote in message
news:97AD404E-8038-404A-9C3A-1975310A6A81@xxxxxxxxxxxxxxxx
Another part of my application is not doing the work either. The client
to
my application will have to do the work of storing the cookie I give back
to
them. It's similar to a DB I guess. The DB just stores the data. If you
want to gain access to it you need to construct the correct SQL to get to
the
data that you stored. In my case they ask me to store a blob. I do so
and
return them a cookie which they need to use when they want to retrieve the
blob they stored.

In addition I have the ability to enumerate all the blobs in the store.

I won't need to have a blob to block map. When someone ask for blob at
index 5 I seek to location 5 * 1GB, read the length of the blob, read the
blob and return it to them. That's it.

The goal was to reduce the complexity of my code. The OS already gives me
this feature for free so why not use it. The Change Journal seems to have
thought it was a good feature. I have the same sort of requirement (I
think). I have a variable blob size and I would like to not waste a lot
of
disk space. So I set an arbitrary large size and let the OS handle
figuring
out what space to consume.

Of course I want to verify the OS implementation was good and that my
requirements are such that using sparse files might make sense which is
why
I'm asking the question here.
--
Thanks,
Nick

nicknospamdu@xxxxxxxxxxxxxxxx
remove "nospam" change community. to msn.com


"m" wrote:

So your solution is to have another part of the app do the work?



Do these references persist across restarts? If so, then how do you
expect
them to be stored? As part of a larger already existing entity?



It was not the specifics of your psudocode that I was responding to but
the
nature of if and the obvious indications that you are not considering the
full scope of the problems you will need to solve for this solution to be
fully functional and reliable.



However, as it is obvious that I can't dissuade you from your purpose, I
have endeavoured to point out some of the problems that you will face



The main advantage of this design is eliminating the need to call
CreateFile
when a new blob is to be written or must be retrieved by having a single
data file always open. Depending on your data rate, this overhead is
likely
insignificant; but if your data rate is really high, then you may need to
follow this approach. Of course the primary reason to use a sparse file
is
to simplify the code, at the expense of IO performance, and this would
seem
counterproductive in this case. Also, the code complexity increased by
keeping a blob to block map versus a static offset is not significant
with
respect to the complexity required to prevent file corruption during HW /
OS
failures.



And hence my original question: what is it about your program that makes
it
possible to do this better than the filesystem?




"nickdu" <nicknospamdu@xxxxxxxxxxxxxxxx> wrote in message
news:3E3FF996-50D7-4179-96EA-92140E6D0789@xxxxxxxxxxxxxxxx
The indiex is the cookie I returned from my Store() method. The
consumer
was
given the cookie and they give it back to me when they want to retrieve
the
blob.

As it was pseudo code I provided I was not too concerned with
performance.
--
Thanks,
Nick

nicknospamdu@xxxxxxxxxxxxxxxx
remove "nospam" change community. to msn.com


"m" wrote:

Okay - how would you get the index? This is what you need to be able
to
do
better than the file system does for this plan to make sense. And you
need
to be able to say that the benefit of using your sparse file is
sufficient
to outweigh the extra problems with backup & maintenance.



BTW: this psudocode is inefficient and can be improved by using page
sized
aligned reads. The seek is unnecessary and will preclude multiple
readers -
use overlapped IO. Both reads can be consolidated for small (less
than 1
buffer sized) blobs and for large blobs, multiple reads will be
required.
All of this is true regardless of how the file(s) that store the data
are
arranged



Also, storing metadata creates the possibility of file corruption -
consider
the case of process termination during a blob insert. If the metadata
is
updated, and the blob data isn't, there will be garbage returned when
the
app asks for that blob; but if the metadata isn't updated, the insert
will
be lost completely; and, even worse, if a partial metadata update is
made,
then the whole file might be unreadable. These risks can be mitigated
by
using FILE_FLAG_NO_BUFFERING + FILE_FLAG_WRITE_THROUGH, but you must
write
code that can handle the consequences of corruption and / or to check
for
these conditions and try to correct them. This is something else that
the
file system does for you directly.




"nickdu" <nicknospamdu@xxxxxxxxxxxxxxxx> wrote in message
news:A3F5E9E5-3FC4-49A7-8F34-666735135A77@xxxxxxxxxxxxxxxx
Hmmm, I replied to this yet it didn't seem to make it to this
thread.

The indexing would look something similar to:

GetBlob(int index, byte *bytes, int size)
{
Seek(_sparse, (index + 1) * 1GB, SEEKOFFSET_ORIGIN);
Read(_sparse, &length, sizeof(length));
Read(_sparse, bytes, min(length, size));
}

Most likely I would use the first block to store information, like
the
next
free block. Other than that the indexing would be as showin above.
Just
like indexing an array. That's the beauty of sparse files, right?
You
can
be wasteful and pick a huge block size and the OS will only consume
the
actual amount of space you use. Of course the OS must have to do
some
book
keeping, but better the OS than me.

The NTFS change journal uses sparse file so I assume it saw a
benefit
in
using it.
--
Thanks,
Nick

nicknospamdu@xxxxxxxxxxxxxxxx
remove "nospam" change community. to msn.com


"m" wrote:

What technique do you plan to use for indexing your blobs that will
be
better than a file system? This is, after all, what they are
designed
to
do.

"nickdu" <nicknospamdu@xxxxxxxxxxxxxxxx> wrote in message
news:ABA4112C-311F-48C1-A39E-9A62F4648620@xxxxxxxxxxxxxxxx
I have a couple questions regarding sparse files.

1. We're looking for an easy and efficient way to store blobs
(array
of
bytes) of data in some sort of circular queue and return some
sort
of
key
someone can use for later access to that blob. The blobs can
very
varying
lengths. I was wondering whether sparse files would be a
reasonable
approach
for this as it appears it's an efficient mechanism for storing
messages
of
varying length (at least that's what I gather).

I guess you could also store each blob in its own file, but then
I
think
the
overhead of creating a file per blob (may have millions) might be
costly.

I could also store all the blobs in a single "normal"
(non-sparse)
file,
but
then I think the house keeping of walking the chain of blobs
(most
likely
I'll need to chain them if I don't use a sparse file) might be
costly
in
terms of performance and also adds to the code I'll have to
write.
Though
I
guess NTFS is keeping its own list of sections of the file that
contain
data.

2. If I want to copy a sparse file to other Windows machines
running
NTFS
do
I need to write my own code to do that or does CopyFile() handle
sparse
files
such that it only copies the parts of the file which contain data
such
that
the copied file is exactly the same as the source file?

--
Thanks,
Nick

nicknospamdu@xxxxxxxxxxxxxxxx
remove "nospam" change community. to msn.com











.



Relevant Pages

  • Re: Some questions about sparse files
    ... The indiex is the cookie I returned from my Store() method. ... to be able to say that the benefit of using your sparse file is sufficient ... storing metadata creates the possibility of file corruption - consider ... the case of process termination during a blob insert. ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Some questions about sparse files
    ... remove "nospam" change community. ... In my case they ask me to store a blob. ... In addition I have the ability to enumerate all the blobs in the store. ... Of course the primary reason to use a sparse file ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Some questions about sparse files
    ... when a new blob is to be written or must be retrieved by having a single ... Of course the primary reason to use a sparse file is ... All of this is true regardless of how the filethat store the data are ... storing metadata creates the possibility of file corruption - ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Some questions about sparse files
    ... In my case they ask me to store a blob. ... In addition I have the ability to enumerate all the blobs in the store. ... remove "nospam" change community. ... Of course the primary reason to use a sparse file is ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Viewing a bitmap Oracle Blob without writing it to file
    ... Image extent: extentPoint depth: aDepth bitsPerPixel: aBitsPerPixel palette: aPalette usingBits: myBits ... In this case you store the information required to recreate the image in the same table including the blob colum holding the the image bits. ... Using the Image class allows you to read all compatible image file formats and then store them into your database. ...
    (comp.lang.smalltalk)