I have a list of file directories that looks similar to this:
path/new/stuff/files/morefiles/A/file2.txt
path/new/stuff/files/morefiles/B/file7.txt
path/new/stuff/files/morefiles/A/file1.txt
path/new/stuff/files/morefiles/C/file5.txt
I am trying to remove the beginnings of the paths that are the same from every list, and then deleting that from each file.
The list can be any length, and in the example I would be trying to change the list into:
A/file2.txt
B/file7.txt
A/file1.txt
C/file5.txt
Methods like re.sub(r'.*I', 'I', filepath)
and filepath.split('_', 1)[-1]
can be used for the replacing, but I'm not sure about how to find the common parts in the list of filepaths
Note:
I am using Windows and python 3
The first part of the answer is here: Python: Determine prefix from a set of (similar) strings
Use
os.path.commonprefix()
to find the longest common (first part) of the stringThe code for selecting the part of the list that is the same as from that answer is:
Now all you have to do is use slicing to remove the resulting string from each item in the list
This results in:
You can split the paths around
'/'
, usezip_longest
to avoid cutting long paths and to transpose the paths.You can then remove the common elements,
zip
again to transpose the paths and join them with'/'
:Already answered here Python: Determine prefix from a set of (similar) strings
"Never rewrite what is provided to you": Use
os.path.commonprefix()
to find the longest common prefix, and then slice your strings accordingly.As the input list contains not just a strings but filenames it seems reasonable to me to consider the common prefix among all filepaths only as a whole-word sequences/sections.
Let's say one of the filepaths is
path/new/stuff2/files/morefiles/C/file5.txt
.The common prefix is determined as
path/new/stuff
, but the 3rd sectionstuff2
will be breaked at the last character2
.So that the lastly mentioned
commonprefix()
implementation will cut such filepath to2/files/morefiles
making it broken and non-accessible(in terms of filesystem). In such case it would be reasonable to cut only the first common whole-word sections (i.e.path/new/
).The solution using
zip()
function andset
object:The input list of filepaths was slightly modified for demonstration purpose: the last filepath differs on 3rd section
.../stuffall/...
:The output: