Re: dataset Performence Issue



Sahil,

An interesting and well argued post. However I fear you are setting up a
Straw Man here. I never suggested that a DataSet is an alternative to a
DBMS. All I'm suggesting is that the OP'S use of a 70Mb DataSet is not
necessarily 'awful'. Indeed there may be cases where that's a very good
thing to do.

Clearly an in-memory approach has inherent scaling limits - the virtual and
physical memory on the machine. But 70Mb is a perfectly reasonable amount
of data to hold in memory in some applications. A suitable structured
in-memory solution tailored to specific queries can outperform a DBMS.
Network and disk costs see to that.

More comments in-line.

Nigel

"Sahil Malik [MVP]" <contactmethrumyblog@xxxxxxxxxx> wrote in message
news:u$hUoCFmFHA.3300@xxxxxxxxxxxxxxxxxxxxxxx

> 1. First of all, Dataset is guess what - Managed Code, and to send all the
> .NET lovers in tailspin, Managed Code can never be as fast and as
> optimized as native code.
Ok, Managed Code has a cost. But DBMS's require network and disk operations,
which have a way bigger cost. You have to look at all the costs before you
can conclude that one approach is faster than the other.

> 2. Secondly, the Garbage collector is that animal that makes things very
> very good for 90% of the situations i.e. normal memory usage, but when you
> start storing many megabytes or close to a gigabyte of information
> completely in RAM - it will actually hurt your application performance. In
> those scenarios, you don't want an external policeman who doesn't
> understand the specific needs of your app. In that situation, you want
> fine control on the memory where you specifiy when it gets cleaned, or
> serialized to the disk etc. You need paging mechanisms etc. which are
> possible to write for the dataset but are a real royal pain to write and
> even then they don't work quite as well as - guess what - native code
> (i.e. most of what SQL Server).
>
My data is going to fit in VM - I would never suggest writing any form of
paging mechansism. Then you really are re-inventing a DBMS. The only paging
mechanism I'm relying on is the OS paging, which is - guess what - native
code that has been optimized to the max. In any case I'd expect to have
enough physical memory not to need paging.

> 3. SQL Server and any database comes with a "Query Engine". The number of
> optimizations built into that is the work of many Phds (or dudes with
> similar smarts and specialization), they have written up SQL Server's
> query engine to take advantage of automatic paging, locking algorithms,
> spilling over to the disk when needed, "query plans", caching those query
> plans - when you compare the object model of a Dataset (or any biz object
> for that matter), the comparison is like comparing a candle with the sun.
>
And it needs all that complexity because it's trying to minimize disk IO,
and because it's got to service every kind of request thrown at it. An
in-memory DataSet used for a specific purpose doesn't need any of that to
achieve the same performance.

> 4. The algorithms in a DataSet is rudimentary, they rely on simple
> techniques such as string matching, string manipulation - that level of
> simplicity. They work on an "Object structure", every value they access
> goes over a dereferenced segment calculation.

DataSets can maintain indexes, as far as I understand. I presume an index
will use a standard Hashtable - which is a reasonably optimized lookup
algorithm. So for a single indexed column lookup performance should be very
good.

>
> 5. Lets not forget transactional locks and many other such points, I
> blogged about it earlier over here -
> http://codebetter.com/blogs/sahil.malik/archive/2005/01/23/47547.aspx
>

Don't need any of that stuff. In-memory is fast - if you really need writes
then just serialize everything with a reader/writer lock.

> 6. Datasets are or any such object - AN IN MEMORY disconnected cache of
> data. Being completely in memory lends them to the disadvantage of a 32
> bit OS's 2 GB memory allocation limit,

Yep - if you have, or may need, 2 Gb of data, use a DBMS. Or use a 64-bit
machine.

> Again, I strongly and vehemently disagree with an architecture that puts 1
> GB data into a DataSet. That is complete stupidity in both .NET 1.1 and
> 2.0.
>
As part of your straw man argument, you've escalated the OP's 70 Mb a bit!


.



Relevant Pages

  • Re: Att. Alex Nichol -VM cont.
    ... I find documentation that says a 4kb page in memory is written to the hard ... manner that a 4kb memory page is equal to a 4kb cluster written on the disk". ... So where is the basis for saying since 4kb paging in memory that 4kb clusters ...
    (microsoft.public.windowsxp.general)
  • Re: Problems reclaiming VM cache = XFree86 startup annoyance
    ... :I'm not sure what is overloading the disk other than the large amounts ... :of paging that are initiated the moment I invoke "startx". ... :allocable memory), with the subtle distinction being that "Cache" ... :pages used to hold disk blocks now discarded, then why not allocate ...
    (freebsd-stable)
  • Re: Virtual memory?
    ... Intel architecture involves three memory management models and they are FLAT ... "The segment selector identifies the segment to be accessed and the offset ... Section 3.6 Paging ... page swapped out to disk is done in 4k incriments. ...
    (microsoft.public.windowsxp.general)
  • Re: Problems reclaiming VM cache = XFree86 startup annoyance
    ... => on the same overloaded disk. ... of paging that are initiated the moment I invoke "startx". ... I executed "startx" the swap volume became very active (90--96% busy; ... "Inactive" memory seems to get dumped over to "Cache" when I run ...
    (freebsd-stable)
  • Att. Alex Nichol -VM cont.
    ... your recommendation that 4k cluster sizes is optimal for the pagefile. ... Intel architecture involves three memory management models and they are FLAT ... Section 3.6 Paging ... page swapped out to disk is done in 4k incriments. ...
    (microsoft.public.windowsxp.general)