How many times can a file be compressed?

2019-01-08 10:05发布

I was thinking about compression, and it seems like there would have to be some sort of limit to the compression that could be applied to it, otherwise it'd be a single byte.

So my question is, how many times can I compress a file before:

  • It does not get any smaller?
  • The file becomes corrupt?

Are these two points the same or different?

Where does the point of diminishing returns appear?

How can these points be found?

I'm not talking about any specific algorithm or particular file, just in general.

14条回答
Viruses.
2楼-- · 2019-01-08 10:18

How many times can I compress a file before it does not get any smaller?

In general, not even one. Whatever compression algorithm you use, there must always exists a file that does not get compressed at all, otherwise you could always compress repeatedly until you reach 1 byte, by your same argument.

How many times can I compress a file before it becomes corrupt?

If the program you use to compress the file does its job, the file will never corrupt (of course I am thinking to lossless compression).

查看更多
手持菜刀,她持情操
3楼-- · 2019-01-08 10:19

Generally the limit is one compression. Some algorithms results in a higher compression ratio, and using a poor algorithm followed by a good algorithm will often result in improvements. But using the good algorithm in the first place is the proper thing to do.

There is a theoretical limit to how much a given set of data can be compressed. To learn more about this you will have to study information theory.

查看更多
看我几分像从前
4楼-- · 2019-01-08 10:20

Here is the ultimate compression algorithm (in Python) which by repeated use will compress any string of digits down to size 0 (it's left as an exercise to the reader how to apply this to a string of bytes).


def compress(digitString):
    if digitString=="":
        raise "already as small as possible"
    currentLen=len(digitString)
    if digitString=="0"*currentLen:
        return "9"*(currentLen-1)
    n=str(long(digitString)-1); #convert to number and decrement
    newLen=len(n);
    return ("0"*(currentLen-newLen))+n; # add zeros to keep same length

#test it
x="12";
while not x=="":
    print x;
    x=compress(x)

The program outputs 12 11 10 09 08 07 06 05 04 03 02 01 00 9 8 7 6 5 4 3 2 1 0 then empty string. It doesn't compress the string at each pass but it will with enough passes compress any digit string down to a zero length string. Make sure you write down how many times you send it through the compressor otherwise you won't be able to get it back.

查看更多
淡お忘
5楼-- · 2019-01-08 10:20

It all depends on the algorithm. In other words the question can be how many times a file can be compressed using this algorithm first, then this one next...

查看更多
Viruses.
6楼-- · 2019-01-08 10:25

In theory, we will never know, it is a never-ending thing:

In computer science and mathematics, the term full employment theorem has been used to refer to a theorem showing that no algorithm can optimally perform a particular task done by some class of professionals. The name arises because such a theorem ensures that there is endless scope to keep discovering new techniques to improve the way at least some specific task is done. For example, the full employment theorem for compiler writers states that there is no such thing as a provably perfect size-optimizing compiler, as such a proof for the compiler would have to detect non-terminating computations and reduce them to a one-instruction infinite loop. Thus, the existence of a provably perfect size-optimizing compiler would imply a solution to the halting problem, which cannot exist, making the proof itself an undecidable problem.

(source)

查看更多
Root(大扎)
7楼-- · 2019-01-08 10:33

Example of a more advanced compression technique using "a double table, or cross matrix" Also elimiates extrenous unnessacry symbols in algorithm

[PREVIOUS EXAMPLE] Take run-length encoding (probably the simplest useful compression) as an example.

04 04 04 04 43 43 43 43 51 52 11 bytes

That series of bytes could be compressed as:

[4] 04 [4] 43 [-2] 51 52 7 bytes (I'm putting meta data in brackets)

[TURNS INTO] 04.43.51.52 VALUES 4.4.**-2 COMPRESSION

Further Compression Using Additonal Symbols as substitute values

04.A.B.C VALUES 4.4.**-2 COMPRESSION

查看更多
登录 后发表回答