How to select files with numbered extensions from

2019-08-12 23:52发布

问题:

I am trying to build my own dataset for a project. Therefore I need to select files that have been exported from another program and come with numbered extensions:

exported_file_1_aaa.001
exported_file_2_aaa.002
exported_file_3_aaa.003
...
exported_file_5_zzz.925
...and so on.

I know how to select files with a specific extension e.g. '.txt' from a folder and append it to a list or dict. Is there any way to solve this with '.nnn'

ext = '.nnn'
all_files = [i for i in os.listdir(dir) if os.path.splitext(i)[1] == ext]
for f in all_files:
    ...

回答1:

You can mix the capabilities of shell globbing (glob) and regex (re).

With glob you can get the files ending with a number, so that we get a limited number files for re to do the final check:

glob.iglob('exported_file_*.*[0-9]')

Then we can match the files precisely with Regex pattern:

\.\d+$

this will match file names ending in digits after last ..

Putting together:

import glob
import re
[file for file in glob.iglob('exported_file_*.*[0-9]') if re.search(r'\.\d+$', file)]

Shell globbing is not as flexible as re, otherwise we could have done with glob alone.

Also, if you're sure that all files end in a certain number of digits then glob alone would work e.g. for files ending in 3 digits after last .:

glob.iglob('exported_file_*.[0-9][0-9][0-9]')


回答2:

If you don't care about the length of the extension, you can use the isdigit method:

all_files = [i for i in os.listdir(dir) if os.path.splitext(i)[1].isdigit()]
for f in all_files: 
    ....


回答3:

You can use the glob module.

import glob

my_dir = "mydir"

all_files = [fn for fn in glob.glob(f"{my_dir}/*.[0-9][0-9][0-9]")]