Method to inspect first 4 bytes and rename file ex

2019-07-22 18:46发布

I have a large batch of assorted files, all missing their file extension.

I'm currently using Windows 7 Pro. I am able to "open with" and experiment to determine what application opens these files, and rename manually to suit.

However I would like some method to identify the correct file type (typically PDF, others include JPG, HTML, DOC, XLS and PPT), and batch rename to add the appropriate file extension.

I am able to open some files with notepad and review the first four bytes, which in some cases shows "%PDF".

I figure a small script would be able to inspect these bytes, and rename as appropriate. However not all files give such an easy method. HTML, JPG, DOC etc do not appear to give such an easy identifier.

This Powershell method appears to be close: https://superuser.com/questions/186942/renaming-multiple-file-extensions-based-on-a-condition

Difficulty here is focusing the method to work on file types with no extension; and then what to do with the files that don't have the first four bytes identifier?

Appreciate any help!!

EDIT: Solution using TriD seen here: http://mark0.net/soft-trid-e.html And recursive method using Powershell to execute TriD here: http://mark0.net/forum/index.php?topic=550.0

标签: filenames
2条回答
ら.Afraid
2楼-- · 2019-07-22 19:17

Use python3.

import os,re
fldrPth = "path/to/folder" # relative to My Documents
os.chdir(fldrPth)
for i in os.listdir():
    with open(i,'r') as doc:
        st = doc.read(4)
    os.rename(i,i+'.'+re.search(r'\w+',st).group())

Hopefully this would work. I don't have test files to check the code. Take a backup and then run it and let me know if it works.

查看更多
smile是对你的礼貌
3楼-- · 2019-07-22 19:34

You could probably save some time by getting a file utility for Windows (see What is the equivalent to the Linux File command for windows?) and then writing a simple script that maps from file type to extension.

EDIT: Looks like the TriD utility that's mentioned on that page can do what you want out of the box; see the -ae and -ce options)

查看更多
登录 后发表回答