Python thinks a file is empty when opened in binar

2019-07-11 06:04发布

I'm running Python 3.5.1 on Windows. I am attempting to find duplicate source code files in a directory by computing their hash. The problem is that Python seems to think some files are empty. Here is the relevant code snippet:

with open(path, 'rb') as afile:
    hasher = hashlib.md5()
    data = afile.read()
    hasher.update(data)
    print("len(data): {}, Path: {}, Hash:{}".format(len(data), path, hasher.hexdigest()))

Here is some example output:

len(data): 0, Path: h:\t\TCPServerSocket.h, Hash:d41d8cd98f00b204e9800998ecf8427e
len(data): 0, Path: h:\t\TCPSocket.cpp, Hash:d41d8cd98f00b204e9800998ecf8427e
len(data): 0, Path: h:\t\TCPSocket.h, Hash:d41d8cd98f00b204e9800998ecf8427e
len(data): 5073, Path: h:\t\ConfigFile.cpp, Hash:6188d6a0e0bc02edf27ce232689beff6

I assure you that these files are not empty, and Python is not throwing any errors during execution. Any ideas?

标签: python
2条回答
疯言疯语
2楼-- · 2019-07-11 06:27

I think you should computer the hash by calling hashlib.md5 on the files them self

import hashlib
hashlib.md5("filename").hexdigest()

Let me know if that continues to suggest files are empty

查看更多
走好不送
3楼-- · 2019-07-11 06:34

I'll just delete this answer if it is not the case, but it's something you need to check. Put this directly before the open block

print("the path is {!r}".format(path))
print("path exists: ", os.path.exists(path))
print("it is a file: ", os.path.isfile(path))
print("file size is: ", os.path.getsize(path))

Because everything in your output is consistent with that file actually being empty. So maybe it is? My first thought was you might be zeroing out the file elsewhere, although you would figure that out pretty quickly.

查看更多
登录 后发表回答