Re: Minimizing the memory requirements

From: Joseph M. Newcomer (newcomer_at_flounder.com)
Date: 04/03/04


Date: Fri, 02 Apr 2004 23:03:28 -0500

Wasting a lot of time saving 5000 bytes is not a good investment of time. I wouldn't worry
about it unless the problem scales up to a couple megabytes. Then using 4-bit encodings
would possibly be worth the effort (I wouldn't touch the problem for less than 10MB of
data, certainly not for a paltry 5K. In fact, even on a palmtop 5K is questionable; mine
has 128MB of RAM and another 512MB of flash card memory).

When you start into multiple tens of megabytes, if the data is purely random, nothing much
will help, but if it has patterns, applying zip-like compression algorithms would work
fine. I once did a project that generated 4MB data files back when 4MB was a significant
percentage of the disk. I replaced the output routines with calls to the pkzip subroutine
library and got the file sizes down to about 150K per run, but there we had a LOT of
redundancy and the compression really bought a lot.. The report generator read through the
unzip subroutine, but the nice thing was that pkzip itself could decompress the data
independent of the program. Today I'd probably look into a gzip-based DLL.
                                        joe

On Thu, 1 Apr 2004 17:25:30 +1000, "Jase" <jshelley@spamblock.enersol.com.au> wrote:

>As you have already noticed, you have a 12 character alphabet, so using only
>4 bits is a good start. In fact, as you have described the data as "random",
>you have already achieved the best compression available. You may be able to
>shave off a few bytes here and there, but the effort will almost certainly
>not be worth it. As far as compression goes, reducing the alphabet is about
>as simple as it gets.
>
>Jase
>
>
>"Rahul" <anonymous@discussions.microsoft.com> wrote in message
>news:300E6817-78E8-4958-9383-5F95F55EFEB3@microsoft.com...
>> Hello All,
>>
>> This messag is a off - technology query.
>>
>> I have a strange problem where in I receive a large strings of 10KB each.
>Right now there can b 8 such strings and in future the number may rise.
>>
>> The string is containing of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ., @.
>>
>> This way each string contains 10,000 bytes representing above characters
>in "RANDOM" sequence.
>>
>> Is there anyway that I can reduce the number of bytes required?
>>
>> One way is replace . with 10 and @ with 11 and then consider only last 4
>bits of every byte.
>> This way I can reduce the memory requirement by half.
>>
>> But is there any more straight forward solution to store the string in the
>minimum bytes?
>>
>> Thanks in advance,
>>
>> Rahul
>

Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm



Relevant Pages

  • Re: Compressing hash strings into two directions - hashing idea
    ... idea "double compression + joining for others to utilize. ... So, I should just say that the pre-hasher part,must ... >hashes (collision resistance). ... Combining the random-like strings is not be expected to make a result ...
    (sci.crypt)
  • Re: Generic algorithms for looseless compression
    ... I think the "Kolmogorov complexity" is only relevant for such kind of ... the strings themselves. ... Algorithms based on statistics are the most ... The compression clients hold a part of that database and ...
    (comp.compression)
  • Re: What is the state-of-the-art analysing hardware impact on achievable compression rat
    ... >> about the strings for which programs exist and the strings for which it ... >> depend on the instruction set supported by the CPU? ... compression possible: some of the content of the string to compress is ... There is no algorithm to compute Kfor all x in any L. ...
    (comp.compression)
  • Compression is Equivalent to General Intelligence
    ... intelligence and compression that needs to be cleared up. ... (encoded as strings). ... The only possible probability distribution over ... The problem of finding the shortest program consistent with an agent's ...
    (comp.ai)
  • Re: Data mining algorithm
    ... the "best" descriptive strings for data. ... That sounds like the first step of a compression algorithm. ... In a similar data set this column will show similar repetativeness at ...
    (comp.programming)

Loading