How to add file extensions based on file type on L

2020-05-18 03:28发布

This is a question regarding Unix shell scripting (any shell), but any other "standard" scripting language solution would also be appreciated:

I have a directory full of files where the filenames are hash values like this:

fd73d0cf8ee68073dce270cf7e770b97
fec8047a9186fdcc98fdbfc0ea6075ee

These files have different original file types such as png, zip, doc, pdf etc.

Can anybody provide a script that would rename the files so they get their appropriate file extension, probably based on the output of the file command?

Answer:

J.F. Sebastian's script will work for both ouput of the filenames as well as the actual renaming.

5条回答
时光不老,我们不散
2楼-- · 2020-05-18 04:08

Here's mimetypes' version:

#!/usr/bin/env python
"""It is a `filename -> filename.ext` filter. 

   `ext` is mime-based.

"""
import fileinput
import mimetypes
import os
import sys
from subprocess import Popen, PIPE

if len(sys.argv) > 1 and sys.argv[1] == '--rename':
    do_rename = True
    del sys.argv[1]
else:
    do_rename = False    

for filename in (line.rstrip() for line in fileinput.input()):
    output, _ = Popen(['file', '-bi', filename], stdout=PIPE).communicate()
    mime = output.split(';', 1)[0].lower().strip()
    ext = mimetypes.guess_extension(mime, strict=False)
    if ext is None:
        ext = os.path.extsep + 'undefined'
    filename_ext = filename + ext
    print filename_ext
    if do_rename:
       os.rename(filename, filename_ext)

Example:

$ ls *.file? | python add-ext.py --rename
avi.file.avi
djvu.file.undefined
doc.file.dot
gif.file.gif
html.file.html
ico.file.obj
jpg.file.jpe
m3u.file.ksh
mp3.file.mp3
mpg.file.m1v
pdf.file.pdf
pdf.file2.pdf
pdf.file3.pdf
png.file.png
tar.bz2.file.undefined

Following @Phil H's response that follows @csl' response:

#!/usr/bin/env python
"""It is a `filename -> filename.ext` filter. 

   `ext` is mime-based.
"""
# Mapping of mime-types to extensions is taken form here:
# http://as3corelib.googlecode.com/svn/trunk/src/com/adobe/net/MimeTypeMap.as
mime2exts_list = [
    ["application/andrew-inset","ez"],
    ["application/atom+xml","atom"],
    ["application/mac-binhex40","hqx"],
    ["application/mac-compactpro","cpt"],
    ["application/mathml+xml","mathml"],
    ["application/msword","doc"],
    ["application/octet-stream","bin","dms","lha","lzh","exe","class","so","dll","dmg"],
    ["application/oda","oda"],
    ["application/ogg","ogg"],
    ["application/pdf","pdf"],
    ["application/postscript","ai","eps","ps"],
    ["application/rdf+xml","rdf"],
    ["application/smil","smi","smil"],
    ["application/srgs","gram"],
    ["application/srgs+xml","grxml"],
    ["application/vnd.adobe.apollo-application-installer-package+zip","air"],
    ["application/vnd.mif","mif"],
    ["application/vnd.mozilla.xul+xml","xul"],
    ["application/vnd.ms-excel","xls"],
    ["application/vnd.ms-powerpoint","ppt"],
    ["application/vnd.rn-realmedia","rm"],
    ["application/vnd.wap.wbxml","wbxml"],
    ["application/vnd.wap.wmlc","wmlc"],
    ["application/vnd.wap.wmlscriptc","wmlsc"],
    ["application/voicexml+xml","vxml"],
    ["application/x-bcpio","bcpio"],
    ["application/x-cdlink","vcd"],
    ["application/x-chess-pgn","pgn"],
    ["application/x-cpio","cpio"],
    ["application/x-csh","csh"],
    ["application/x-director","dcr","dir","dxr"],
    ["application/x-dvi","dvi"],
    ["application/x-futuresplash","spl"],
    ["application/x-gtar","gtar"],
    ["application/x-hdf","hdf"],
    ["application/x-javascript","js"],
    ["application/x-koan","skp","skd","skt","skm"],
    ["application/x-latex","latex"],
    ["application/x-netcdf","nc","cdf"],
    ["application/x-sh","sh"],
    ["application/x-shar","shar"],
    ["application/x-shockwave-flash","swf"],
    ["application/x-stuffit","sit"],
    ["application/x-sv4cpio","sv4cpio"],
    ["application/x-sv4crc","sv4crc"],
    ["application/x-tar","tar"],
    ["application/x-tcl","tcl"],
    ["application/x-tex","tex"],
    ["application/x-texinfo","texinfo","texi"],
    ["application/x-troff","t","tr","roff"],
    ["application/x-troff-man","man"],
    ["application/x-troff-me","me"],
    ["application/x-troff-ms","ms"],
    ["application/x-ustar","ustar"],
    ["application/x-wais-source","src"],
    ["application/xhtml+xml","xhtml","xht"],
    ["application/xml","xml","xsl"],
    ["application/xml-dtd","dtd"],
    ["application/xslt+xml","xslt"],
    ["application/zip","zip"],
    ["audio/basic","au","snd"],
    ["audio/midi","mid","midi","kar"],
    ["audio/mpeg","mp3","mpga","mp2"],
    ["audio/x-aiff","aif","aiff","aifc"],
    ["audio/x-mpegurl","m3u"],
    ["audio/x-pn-realaudio","ram","ra"],
    ["audio/x-wav","wav"],
    ["chemical/x-pdb","pdb"],
    ["chemical/x-xyz","xyz"],
    ["image/bmp","bmp"],
    ["image/cgm","cgm"],
    ["image/gif","gif"],
    ["image/ief","ief"],
    ["image/jpeg","jpg","jpeg","jpe"],
    ["image/png","png"],
    ["image/svg+xml","svg"],
    ["image/tiff","tiff","tif"],
    ["image/vnd.djvu","djvu","djv"],
    ["image/vnd.wap.wbmp","wbmp"],
    ["image/x-cmu-raster","ras"],
    ["image/x-icon","ico"],
    ["image/x-portable-anymap","pnm"],
    ["image/x-portable-bitmap","pbm"],
    ["image/x-portable-graymap","pgm"],
    ["image/x-portable-pixmap","ppm"],
    ["image/x-rgb","rgb"],
    ["image/x-xbitmap","xbm"],
    ["image/x-xpixmap","xpm"],
    ["image/x-xwindowdump","xwd"],
    ["model/iges","igs","iges"],
    ["model/mesh","msh","mesh","silo"],
    ["model/vrml","wrl","vrml"],
    ["text/calendar","ics","ifb"],
    ["text/css","css"],
    ["text/html","html","htm"],
    ["text/plain","txt","asc"],
    ["text/richtext","rtx"],
    ["text/rtf","rtf"],
    ["text/sgml","sgml","sgm"],
    ["text/tab-separated-values","tsv"],
    ["text/vnd.wap.wml","wml"],
    ["text/vnd.wap.wmlscript","wmls"],
    ["text/x-setext","etx"],
    ["video/mpeg","mpg","mpeg","mpe"],
    ["video/quicktime","mov","qt"],
    ["video/vnd.mpegurl","m4u","mxu"],
    ["video/x-flv","flv"],
    ["video/x-msvideo","avi"],
    ["video/x-sgi-movie","movie"],
    ["x-conference/x-cooltalk","ice"]]

#NOTE: take only the first extension
mime2ext = dict(x[:2] for x in mime2exts_list)

if __name__ == '__main__':
    import fileinput, os.path
    from subprocess import Popen, PIPE

    for filename in (line.rstrip() for line in fileinput.input()):
        output, _ = Popen(['file', '-bi', filename], stdout=PIPE).communicate()
        mime = output.split(';', 1)[0].lower().strip()
        print filename + os.path.extsep + mime2ext.get(mime, 'undefined')

Here's a snippet for old python's versions (not tested):

#NOTE: take only the first extension
mime2ext = {}
for x in mime2exts_list:
    mime2ext[x[0]] = x[1]

if __name__ == '__main__':
    import os
    import sys

    # this version supports only stdin (part of fileinput.input() functionality)
    lines = sys.stdin.read().split('\n')
    for line in lines:
        filename = line.rstrip()
        output = os.popen('file -bi ' + filename).read()        
        mime = output.split(';')[0].lower().strip()
        try: ext = mime2ext[mime]
        except KeyError:
             ext = 'undefined'
        print filename + '.' + ext

It should work on Python 2.3.5 (I guess).

查看更多
戒情不戒烟
3楼-- · 2020-05-18 04:16

You can use

file -i filename

to get a MIME-type. You could potentially lookup the type in a list and then append an extension. You can find a list of MIME-types and example file extensions on the net.

查看更多
Fickle 薄情
4楼-- · 2020-05-18 04:19

Of course, it should be added that deciding on a MIME type just based on file(1) output can be very inaccurate/vague (what's "data" ?) or even completely incorrect...

查看更多
做个烂人
5楼-- · 2020-05-18 04:24

Following csl's response:

You can use

file -i filename

to get a MIME-type. You could potentially lookup the type in a list and then append an extension. You can find list of MIME-types and suggested file extensions on the net.

I'd suggest you write a script that takes the output of file -i filename, and returns an extension (split on spaces, find the '/', look up that term in a table file) in your language of choice - a few lines at most. Then you can do something like:

ls | while read f; do mv "$f" "$f".`file -i "$f" | get_extension.py`; done

in bash, or throw that in a bash script. Or make the get_extension script bigger, but that makes it less useful next time you want the relevant extension.

Edit: change from for f in * to ls | while read f because the latter handles filenames with spaces in (a particular nightmare on Windows).

查看更多
唯我独甜
6楼-- · 2020-05-18 04:25

Agreeing with Keltia, and elaborating some on his answer:

Take care -- some filetypes may be problematic. JPEG2000, for example.
And others might return too much info given the "file" command without any option tags. The way to avoid this is to use "file -b" for a brief return of information.

BZT

查看更多
登录 后发表回答