I am trying to build my own dataset for a project. Therefore I need to select files that have been exported from another program and come with numbered extensions:
exported_file_1_aaa.001
exported_file_2_aaa.002
exported_file_3_aaa.003
...
exported_file_5_zzz.925
...and so on.
I know how to select files with a specific extension e.g. '.txt' from a folder and append it to a list or dict. Is there any way to solve this with '.nnn'
ext = '.nnn'
all_files = [i for i in os.listdir(dir) if os.path.splitext(i)[1] == ext]
for f in all_files:
...
You can use the
glob
module.If you don't care about the length of the extension, you can use the isdigit method:
You can mix the capabilities of shell globbing (
glob
) and regex (re
).With
glob
you can get the files ending with a number, so that we get a limited number files forre
to do the final check:Then we can match the files precisely with Regex pattern:
this will match file names ending in digits after last
.
.Putting together:
Shell globbing is not as flexible as
re
, otherwise we could have done withglob
alone.Also, if you're sure that all files end in a certain number of digits then
glob
alone would work e.g. for files ending in 3 digits after last.
: