I have a large batch of assorted files, all missing their file extension.
I'm currently using Windows 7 Pro. I am able to "open with" and experiment to determine what application opens these files, and rename manually to suit.
However I would like some method to identify the correct file type (typically PDF, others include JPG, HTML, DOC, XLS and PPT), and batch rename to add the appropriate file extension.
I am able to open some files with notepad and review the first four bytes, which in some cases shows "%PDF".
I figure a small script would be able to inspect these bytes, and rename as appropriate. However not all files give such an easy method. HTML, JPG, DOC etc do not appear to give such an easy identifier.
This Powershell method appears to be close: https://superuser.com/questions/186942/renaming-multiple-file-extensions-based-on-a-condition
Difficulty here is focusing the method to work on file types with no extension; and then what to do with the files that don't have the first four bytes identifier?
Appreciate any help!!
EDIT: Solution using TriD seen here: http://mark0.net/soft-trid-e.html And recursive method using Powershell to execute TriD here: http://mark0.net/forum/index.php?topic=550.0
Use python3.
Hopefully this would work. I don't have test files to check the code. Take a backup and then run it and let me know if it works.
You could probably save some time by getting a
file
utility for Windows (see What is the equivalent to the Linux File command for windows?) and then writing a simple script that maps from file type to extension.EDIT: Looks like the TriD utility that's mentioned on that page can do what you want out of the box; see the -ae and -ce options)