How does one make a Zip bomb?

2019-01-20 23:30发布

This question about zip bombs naturally led me to the Wikipedia page on the topic. The article mentions an example of a 45.1 kb zip file that decompresses to 1.3 exabytes.

What are the principles/techniques that would be used to create such a file in the first place? I don't want to actually do this, more interested in a simplified "how-stuff-works" explanation of the concepts involved.

p.s.

The article mentions 9 layers of zip files, so it's not a simple case of zipping a bunch of zeros. Why 9, why 10 files in each?

14条回答
爷、活的狠高调
2楼-- · 2019-01-20 23:45

The article mentions 9 layers of zip files, so it's not a simple case of zipping a bunch of zeros. Why 9, why 10 files in each?

First off, the Wikipedia article currently says 5 layers with 16 files each. Not sure where the discrepancy comes from, but it's not all that relevant. The real question is why use nesting in the first place.

DEFLATE, the only commonly supported compression method for zip files*, has a maximum compression ratio of 1032. This can be achieved asymptotically for any repeating sequence of 1-3 bytes. No matter what you do to a zip file, as long as it is only using DEFLATE, the unpacked size will be at most 1032 times the size of the original zip file.

Therefore, it is necessary to use nested zip files to achieve really outrageous compression ratios. If you have 2 layers of compression, the maximum ratio becomes 1032^2 = 1065024. For 3, it's 1099104768, and so on. For the 5 layers used in 42.zip, the theoretical maximum compression ratio is 1170572956434432. As you can see, the actual 42.zip is far from that level. Part of that is the overhead of the zip format, and part of it is that they just didn't care.

If I had to guess, I'd say that 42.zip was formed by just creating a large empty file, and repeatedly zipping and copying it. There is no attempt to push the limits of the format or maximize compression or anything - they just arbitrarily picked 16 copies per layer. The point was to create a large payload without much effort.

Note: Other compression formats, such as bzip2, offer much, much, much larger maximum compression ratios. However, most zip parsers don't accept them.

P.S. It is possible to create a zip file which will unzip to a copy of itself (a quine). You can also make one that unzips to multiple copies of itself. Therefore, if you recursively unzip a file forever, the maximum possible size is infinite. The only limitation is that it can increase by at most 1032 on each iteration.

P.P.S. The 1032 figure assumes that file data in the zip are disjoint. One quirk of the zip file format is that it has a central directory which lists the files in the archive and offsets to the file data. If you create multiple file entries pointing to the same data, you can achieve much higher compression ratios even with no nesting, but such a zip file is likely to be rejected by parsers.

查看更多
在下西门庆
3楼-- · 2019-01-20 23:46

Citing from the Wikipedia page:

One example of a Zip bomb is the file 45.1.zip which was 45.1 kilobytes of compressed data, containing nine layers of nested zip files in sets of 10, each bottom layer archive containing a 1.30 gigabyte file for a total of 1.30 exabytes of uncompressed data.

So all you need is one single 1.3GB file full of zeroes, compress that into a ZIP file, make 10 copies, pack those into a ZIP file, and repeat this process 9 times.

This way, you get a file which, when uncompressed completely, produces an absurd amount of data without requiring you to start out with that amount.

Additionally, the nested archives make it much harder for programs like virus scanners (the main target of these "bombs") to be smart and refuse to unpack archives that are "too large", because until the last level the total amount of data is not that much, you don't "see" how large the files at the lowest level are until you have reached that level, and each individual file is not "too large" - only the huge number is problematic.

查看更多
Evening l夕情丶
4楼-- · 2019-01-20 23:48

All file compression algorithms rely on the entropy of the information to be compressed. Theoretically you can compress a stream of 0's or 1's, and if it's long enough, it will compress very well.

That's the theory part. The practical part has already been pointed out by others.

查看更多
够拽才男人
5楼-- · 2019-01-20 23:50

A nice way to create a zipbomb (or gzbomb) is to know the binary format you are targeting. Otherwise, even if you use a streaming file (for example using /dev/zero) you'll still be limited by computing power needed to compress the stream.

A nice example of a gzip bomb: http://selenic.com/googolplex.gz57 (there's a message embedded in the file after several level of compression resulting in huge files)

Have fun finding that message :)

查看更多
成全新的幸福
6楼-- · 2019-01-20 23:51

I don't know if ZIP uses Run Length Encoding, but if it did, such a compressed file would contain a small piece of data and a very large run-length value. The run-length value would specify how many times the small piece of data is repeated. When you have a very large value, the resultant data is proportionally large.

查看更多
Lonely孤独者°
7楼-- · 2019-01-20 23:51

Silicon Valley Season 3 Episode 7 brought me here. The steps to generate a zip bomb would be.

  1. Create a dummy file with zeros (or ones if you think they're skinny) of size (say 1 GB).
  2. Compress this file to a zip-file say 1.zip.
  3. Make n (say 10) copies of this file and add these 10 files to a compressed archive (say 2.zip).
  4. Repeat step 3 k number of times.
  5. You'll get a zip bomb.

For a Python implementation, check this.

查看更多
登录 后发表回答