Re: Any good refs on dealing with tightly packed data in C#?



On Sat, 12 Apr 2008 13:50:37 +0200, Jeroen Mostert
<jmostert@xxxxxxxxx> wrote:

_dee wrote:
I'm working on a port of a legacy app originally written in C. Data
was compacted into bit fields. Are there any sites or books that cover
optimized handling of this type of data? I'd need to develop optimized
functions for reading and writing, and given the volume of source, I
don't want to end up with pages of obscure code. (Surprisingly, the
original C was pretty clear).

Yes, but the resulting assembly code probably wasn't. :-) To access those
bit fields, the compiler needs to generate tricky code for masking and
shifting the values every time they're accessed. The resulting slowdown can
be prohibitive, which is why storage space needs to really be at a premium
for bit fields to be worth it.

Thanks for the followup comments, Jeroen! Some good ideas in your
reply (logged to file for later).

After checking into BitVector32, etc, I've decided that the best
approach for the first version is probably direct mask/shift ops on
the raw byte stream. I can get the first code version done by
paralleling the earlier C code. After I've got the first version
running, a more elegant approach may become obvious.

Having to write the masking and shifting code in C# itself is a real pain,
and you'll have to maintain separate code for 64-bit (if that's relevant).

This was part of the motivation for dealing with individual bytes.
That should not incur any sync problems and it should transport more
easily.

It might be an option to just write those functions in pure C and p/invoke
to them, or to write them in C++/CLI for easier integration with managed
code. Doing it all in pure C# is possible, but probably not the most
effective approach.

Yes, C++ may make sense here (especially after I get into trouble
later. <g>). But I was trying to stay with C# for the first prototype.

The biggest remaining hitch for C# is a simple one: Passing 'pointers'
to individual tokens within the byte stream. I've done something like
this in the past with 'unsafe' code, but I was trying to avoid that.

The goal is to pass refs to tokens (short groups of bytes) so I can
avoid copying bytes IOW, passing a pointer to the middle of a byte
array. This is actually the main place where I miss C++. I don't
suppose you know any tricks for doing that?
.