I am trying to build my own dataset for a project. Therefore I need to select files that have been exported from another program and come with numbered extensions:
exported_file_1_aaa.001
exported_file_2_aaa.002
exported_file_3_aaa.003
...
exported_file_5_zzz.925
...and so on.
I know how to select files with a specific extension e.g. '.txt' from a folder and append it to a list or dict. Is there any way to solve this with '.nnn'
ext = '.nnn'
all_files = [i for i in os.listdir(dir) if os.path.splitext(i)[1] == ext]
for f in all_files:
...
You can mix the capabilities of shell globbing (glob
) and regex (re
).
With glob
you can get the files ending with a number, so that we get a limited number files for re
to do the final check:
glob.iglob('exported_file_*.*[0-9]')
Then we can match the files precisely with Regex pattern:
\.\d+$
this will match file names ending in digits after last .
.
Putting together:
import glob
import re
[file for file in glob.iglob('exported_file_*.*[0-9]') if re.search(r'\.\d+$', file)]
Shell globbing is not as flexible as re
, otherwise we could have done with glob
alone.
Also, if you're sure that all files end in a certain number of digits then glob
alone would work e.g. for files ending in 3 digits after last .
:
glob.iglob('exported_file_*.[0-9][0-9][0-9]')
If you don't care about the length of the extension, you can use the isdigit method:
all_files = [i for i in os.listdir(dir) if os.path.splitext(i)[1].isdigit()]
for f in all_files:
....
You can use the glob
module.
import glob
my_dir = "mydir"
all_files = [fn for fn in glob.glob(f"{my_dir}/*.[0-9][0-9][0-9]")]