Re: hair-brained code versioning and compression idea

Tech-Archive recommends: Fix windows errors by optimizing your registry

From: Jon Skeet [C# MVP] (skeet_at_pobox.com)
Date: 11/29/04


Date: Mon, 29 Nov 2004 17:19:37 -0000

Bonj <Bonj@discussions.microsoft.com> wrote:

<snip>

> I'm not going to use huffman compression or RLE compression - because
> they're for bytes. I'm only going to be storing my code files - .cs files,
> .c/.cpp files, .h files, even .xml files - so I'm going to use my own type of
> compression - word compression.

<snip>

> Now, you can imagine that words like "System" would be quite popular, hence
> wouldn't take up much space. Also, the text between words can be just stored
> as another word, so "}\n\t" would probably be quite popular aswell.

That sounds very much like what Huffman compression is going to do
anyway. Admittedly, it'll be better if all your files use the same
encoding, but that's likely to be the case anyway - and then a common
word becomes a common byte sequence, so Huffman compression will spot
it and compress it to a short bit sequence.

Have you tried just using things like zip and compared the space saved
by that with the space saved by your algorithm?

-- 
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too


Relevant Pages

  • Re: hair-brained code versioning and compression idea
    ... but that's likely to be the case anyway - and then a common ... > it and compress it to a short bit sequence. ... I'm using ASCII encoding to store the words in the database. ... Is a compression ratio of 0.3 - 0.4 pretty good? ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: hair-brained code versioning and compression idea
    ... but that's likely to be the case anyway - and then a common ... > it and compress it to a short bit sequence. ... I'm using ASCII encoding to store the words in the database. ... Is a compression ratio of 0.3 - 0.4 pretty good? ...
    (microsoft.public.vc.language)
  • Re: MP3 and WMA on to CD
    ... These are two types of music compression which compress files by throwing ... Some 'ordinary' CD players can play CD Data Discs containing MP3 files. ...
    (comp.sys.acorn.misc)
  • Re: Apple and EMI
    ... If you snip relevant context there is no point in continuing. ... Being paid for worthless opinions doesn't make those opinions true. ... unless you establish it under double blind listening conditions. ... But the point is that I believed that 256kbit/s AAC compression would be ...
    (uk.comp.sys.mac)
  • Re: A question to The Creationists, again
    ... subsequent compression event. ... geologists conclude it has. ... I recognize a difference between low and normal angle thrust faults. ...
    (talk.origins)