Extracting extension from filename in Python

2019-01-01 02:32发布

问题:

Is there a function to extract the extension from a filename?

回答1:

Yes. Use os.path.splitext(see Python 2.X documentation or Python 3.X documentation):

>>> import os
>>> filename, file_extension = os.path.splitext(\'/path/to/somefile.ext\')
>>> filename
\'/path/to/somefile\'
>>> file_extension
\'.ext\'

Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:

>>> os.path.splitext(\'/a/b.c/d\')
(\'/a/b.c/d\', \'\')
>>> os.path.splitext(\'.bashrc\')
(\'.bashrc\', \'\')


回答2:

import os.path
extension = os.path.splitext(filename)[1]


回答3:

New in version 3.4.

import pathlib

print(pathlib.Path(\'yourPathGoesHere\').suffix)

I\'m surprised no one has mentioned pathlib yet, pathlib IS awesome!

If you need all the suffixes (eg if you have a .tar.gz), .suffixes will return a list of them!



回答4:

import os.path
extension = os.path.splitext(filename)[1][1:]

To get only the text of the extension, without the dot.



回答5:

One option may be splitting from dot:

>>> filename = \"example.jpeg\"
>>> filename.split(\".\")[-1]
\'jpeg\'

No error when file doesn\'t have an extension:

>>> \"filename\".split(\".\")[-1]
\'filename\'

But you must be careful:

>>> \"png\".split(\".\")[-1]
\'png\'    # But file doesn\'t have an extension


回答6:

worth adding a lower in there so you don\'t find yourself wondering why the JPG\'s aren\'t showing up in your list.

os.path.splitext(filename)[1][1:].strip().lower()


回答7:

Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example:

import os.path
extension = os.path.splitext(filename)[1][1:].strip() 


回答8:

With splitext there are problems with files with double extension (e.g. file.tar.gz, file.tar.bz2, etc..)

>>> fileName, fileExtension = os.path.splitext(\'/path/to/somefile.tar.gz\')
>>> fileExtension 
\'.gz\'

but should be: .tar.gz

The possible solutions are here



回答9:

filename=\'ext.tar.gz\'
extension = filename[filename.rfind(\'.\'):]


回答10:

Surprised this wasn\'t mentioned yet:

import os
fn = \'/some/path/a.tar.gz\'

basename = os.path.basename(fn)  # os independent
Out[] a.tar.gz

base = basename.split(\'.\')[0]
Out[] a

ext = \'.\'.join(basename.split(\'.\')[1:])   # <-- main part

# if you want a leading \'.\', and if no result `None`:
ext = \'.\' + ext if ext else None
Out[] .tar.gz

Benefits:

  • Works as expected for anything I can think of
  • No modules
  • No regex
  • Cross-platform
  • Easily extendible (e.g. no leading dots for extension, only last part of extension)

As function:

def get_extension(filename):
    basename = os.path.basename(filename)  # os independent
    ext = \'.\'.join(basename.split(\'.\')[1:])
    return \'.\' + ext if ext else None


回答11:

Although it is an old topic, but i wonder why there is none mentioning a very simple api of python called rpartition in this case:

to get extension of a given file absolute path, you can simply type:

filepath.rpartition(\'.\')[-1]

example:

path = \'/home/jersey/remote/data/test.csv\'
print path.rpartition(\'.\')[-1]

will give you: \'csv\'



回答12:

You can find some great stuff in pathlib module.

import pathlib
x = pathlib.PurePosixPath(\"C:\\\\Path\\\\To\\\\File\\\\myfile.txt\").suffix
print(x)

# Output 
\'.txt\'


回答13:

You can use a split on a filename:

f_extns = filename.split(\".\")
print (\"The extension of the file is : \" + repr(f_extns[-1]))

This does not require additional library



回答14:

Just join all pathlib suffixes.

>>> x = \'file/path/archive.tar.gz\'
>>> y = \'file/path/text.txt\'
>>> \'\'.join(pathlib.Path(x).suffixes)
\'.tar.gz\'
>>> \'\'.join(pathlib.Path(y).suffixes)
\'.txt\'


回答15:

Another solution with right split:

# to get extension only

s = \'test.ext\'

if \'.\' in s: ext = s.rsplit(\'.\', 1)[1]

# or, to get file name and extension

def split_filepath(s):
    \"\"\"
    get filename and extension from filepath 
    filepath -> (filename, extension)
    \"\"\"
    if not \'.\' in s: return (s, \'\')
    r = s.rsplit(\'.\', 1)
    return (r[0], r[1])


回答16:

This is a direct string representation techniques : I see a lot of solutions mentioned, but I think most are looking at split. Split however does it at every occurrence of \".\" . What you would rather be looking for is partition.

string = \"folder/to_path/filename.ext\"
extension = string.rpartition(\".\")[-1]


回答17:

Even this question is already answered I\'d add the solution in Regex.

>>> import re
>>> file_suffix = \".*(\\..*)\"
>>> result = re.search(file_suffix, \"somefile.ext\")
>>> result.group(1)
\'.ext\'


回答18:

def NewFileName(fichier):
    cpt = 0
    fic , *ext =  fichier.split(\'.\')
    ext = \'.\'.join(ext)
    while os.path.isfile(fichier):
        cpt += 1
        fichier = \'{0}-({1}).{2}\'.format(fic, cpt, ext)
    return fichier


回答19:

# try this, it works for anything, any length of extension
# e.g www.google.com/downloads/file1.gz.rs -> .gz.rs

import os.path

class LinkChecker:

    @staticmethod
    def get_link_extension(link: str)->str:
        if link is None or link == \"\":
            return \"\"
        else:
            paths = os.path.splitext(link)
            ext = paths[1]
            new_link = paths[0]
            if ext != \"\":
                return LinkChecker.get_link_extension(new_link) + ext
            else:
                return \"\"


回答20:

name_only=file_name[:filename.index(\".\")

That will give you the file name up to the first \".\", which would be the most common.