The problem, and it may not be easily solved with a regex, is that I want to be able to extract a Windows file path from an arbitrary string. The closest that I have been able to come (I've tried a bunch of others) is using the following regex:
[a-zA-Z]:\\([a-zA-Z0-9() ]*\\)*\w*.*\w*
Which picks up the start of the file and is designed to look at patterns (after the initial drive letter) of strings followed by a backslash and ending with a file name, optional dot, and optional extension.
The difficulty is what happens, next. Since the maximum path length is 260 characters, I only need to count 260 characters beyond the start. But since spaces (and other characters) are allowed in file names I would need to make sure that there are no additional backslashes that could indicate that the prior characters are the name of a folder and that what follows isn't the file name, itself.
I am pretty certain that there isn't a perfect solition (the perfect being the enemy of the good) but I wondered if anyone could suggest a "best possible" solution?
Here's the expression I got, based on yours, that allow me to get the path on windows :
[a-zA-Z]:\\((?:[a-zA-Z0-9() ]*\\)*).*
. An example of it being used is available here : https://regex101.com/r/SXUlVX/1First, I changed the capture group from
([a-zA-Z0-9() ]*\\)*
to((?:[a-zA-Z0-9() ]*\\)*)
.Your original expression captures each
XXX\
one after another (eg :Users\
theUsers\
).Mine matches
(?:[a-zA-Z0-9() ]*\\)*
. This allows me to capture the concatenation ofXXX\YYYY\ZZZ\
before capturing. As such, it allows me to get the full path.The second change I made is related to the filename : I'll just match any group of character that does not contain
\
(the capture group being greedy). This allows me to take care of strange file names.Another regex that would work would be :
[a-zA-Z]:\\((?:.*?\\)*).*
as shown in this example : https://regex101.com/r/SXUlVX/2This time, I used
.*?\\
to match theXXX\
parts of the path..*?
will match in a non-greedy way : thus,.*?\\
will match the bare minimum of text followed by a back-slash.Do not hesitate if you have any question regarding the expressions.
I'd also encourage you to try to see how well your expression works using : https://regex101.com . This also has a list of the different tokens you can use in your regex.
Edit : As my previous answer did not work (though I'll need to spend some times to find out exactly why), I looked for another way to do what you want. And I managed to do so using string splitting and joining.
The command is
"\\".join(TARGETSTRING.split("\\")[1:-1])
.How does this work : Is plit the original string into a list of substrings, based. I then remove the first and last part (
[1:-1]
from 2nd element to the one before the last) and transform the resulting list back into a string.This works, whether the value given is a path or the full address of a file.
Program Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred
is a file pathProgram Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred\
is a directory path