In honor of the Hutter Prize, what are the top algorithms (and a quick description of each) for text compression?
Note: The intent of this question is to get a description of compression algorithms, not of compression programs.
In honor of the Hutter Prize, what are the top algorithms (and a quick description of each) for text compression?
Note: The intent of this question is to get a description of compression algorithms, not of compression programs.
The boundary-pushing compressors combine algorithms for insane results. Common algorithms include:
Maximum Compression is a pretty cool text and general compression benchmark site. Matt Mahoney publishes another benchmark. Mahoney's may be of particular interest because it lists the primary algorithm used per entry.
If you want to use PAQ as a program, you can install the
zpaq
package on debian-based systems. Usage is (see alsoman zpaq
)Compression was to about 1/10th of a zip file's size. (1.9M vs 15M)
There's always lzip.
All kidding aside:
DEFLATE
algorithm) still wins.LZMA
algorithm) compresses very well and is available for under the LGPL. Few operating systems ship with built-in support, however.