Detecting duplicate files-第2页回答

Detecting duplicate files

2019-03-21 11:00发布

I'd like to detect duplicate files in a directory tree. When two identical files are found only one of the duplicates will be preserved and the remaining duplicates will be deleted to save the disk space.

The duplicate means files having the same content which may differ in file names and path.

I was thinking about using hash algorithms for this purpose but there is a chance that different files have the same hashes, so I need some additional mechanism to tell me that the files aren't the same even though the hashes are the same because I don't want to delete two different files.

Which additional fast and reliable mechanism would you use?

标签： algorithm duplicates hash

7条回答

混吃等死

2楼-- · 2019-03-21 11:47

This is the typical output of a md5sum:

0c9990e3d02f33d1ea2d63afb3f17c71

If you don't have to fear intentionally faked files, the chances to for a second, random file to match is

1/(decimal(0xfffffffffffffffffffffffffffffff)+1)

If you take the file size into account as additional test, your certainty increases, that both files fit. You might add more and more measurement, but a bitwise comparison will be the last word in such a debate. For practical purpose, md5sum should be enough.

0人赞添加讨论(0) 举报

上一页 1 2

Detecting duplicate files

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间