i have a folder with over 1 million files. the files come in couples that only differ by their extension (e.g. a1.ext1 a1.ext2, a2.ext1, a2.ext2 ...)
i need to scan this folder and make sure that it fulfills this requirement (of file coupling), and if i find a file without its match i should delete it.
i've already done it in python, but it was super slow when it came to working with the 7-figure number of files..
is there a way to do this using a shell command/script?
Building on another answer, you could use script like this (it is supposed to be in the same directory where files are located, and should be executed there):
You can configure where to move files that doesn't have their pair by modifying
THRASH
variable.On dual core Pentium with 3.0 GHz and 2 GB of RAM one run took 63.7 seconds (10000 pairs, with about 1500 of each member of the pair missing from the folder).
Python should be faster; however if you want to try in bash:
This should display filenames with more or less than two extensions.
Try this one: