[CDG5] Random questions

Max Poliakovski maximumspatium at googlemail.com
Wed Dec 12 16:47:31 AWST 2018


The bitmap method is capable of expressing any input. Its output isn't
guarantied to be optimal though. In the theoretical worst case, the
compressed string will be longer than the uncompressed one. This may happen
for highly randomized data, although I don't know that for sure.

Auto-selection of the compression method is indeed easy in this precise
case. If the derived words dictionary contains > 256 entries, select the
bitmap method (i.e. extended alphabet), otherwise - the non-bitmap method.
The latter is guarantied to achieve a better compression.

As opposite, I don't know how to quickly perform auto-selection of coding
tables without multi-pass encoding and comparing lengths. That's the point
where some additional tests on real-world data may shed some light...


Am Mi., 12. Dez. 2018 um 05:01 Uhr schrieb Elliot Nunn <
elliotnunn at fastmail.com>:

> On reflection, perhaps auto detection would be better, and quicker to
> implement. And it covers the case where a resource gets changed to become
> inexpressible by the bitmap method.
>
> On 12 Dec 2018, at 11:20 am, Elliot Nunn <elliotnunn at fastmail.com> wrote:
>
> Cool! The code I’ve written allows compression algorithms to be described
> by an arbitrary string. ‘GreggyBitsBitmap’ would work?
>
> On 12 Dec 2018, at 10:17 am, Max Poliakovski <
> maximumspatium at googlemail.com> wrote:
>
> From my understanding, the bit-mapped compression allows for a bigger
> alphabet than a 256 words table can hold. This means a greater flexibility
> but a worse compression ratio. Instead of replacing each word with a table
> index (which ideally yields a 2:1 compression), we add an extra byte for
> each group of eight words indicating if a particular word is coded "as is"
> (plain text) or as table index. The compression ratio in this case is left
> as an exercise to the reader.
>
> Non-bitmapped compression assumes that all words will be in the lookup
> table that, in turn, seems to be pretty small for large resources. This
> therefore makes me think that the bitmapped compression is used more
> frequently than the non-bitmapped one.
>
> My code doesn't currently support automatic mode/table switching but
> relies on these parameters being supplied by user. Because we're speaking
> about small changes, recompressing a particular resource under control of
> the original parameters looks like the easiest solution...
>
>
> Am Mi., 12. Dez. 2018 um 00:25 Uhr schrieb Elliot Nunn <
> elliotnunn at fastmail.com>:
>
>> Neat!
>>
>> What is the significance of isBitMapped?
>> ___________________________________
>> cdg5 mailing list
>> cdg5 at ucc.asn.au
>> https://lists.ucc.gu.uwa.edu.au/mailman/listinfo/cdg5
>>
> _______________________________________________
> cdg5 mailing list
> cdg5 at ucc.asn.au
> https://lists.ucc.gu.uwa.edu.au/mailman/listinfo/cdg5
>
> _______________________________________________
> cdg5 mailing list
> cdg5 at ucc.asn.au
> https://lists.ucc.gu.uwa.edu.au/mailman/listinfo/cdg5
>
> _______________________________________________
> cdg5 mailing list
> cdg5 at ucc.asn.au
> https://lists.ucc.gu.uwa.edu.au/mailman/listinfo/cdg5
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ucc.gu.uwa.edu.au/pipermail/cdg5/attachments/20181212/4b35472d/attachment.htm 


More information about the cdg5 mailing list