I have a string which contains multiple file paths, some of which contain arbitrary newlines within the path, and I want to parse the string using python so that only the filenames and extensions remain.
For example:
a/b/c/d/file1.c
a/b/c/d/e/f/g/h/1/2/3/4/5/foo.c
dir1/dir2/newlinedir
/nextlinedir/bar.c
should be parsed to give output:
file1.c
foo.c
bar.c
I am using the following regular expression (the groups for the filename and extension must be separate for later purposes):
path_regex = re.compile(r'.*\/([^\/\.]*)(\.c){0,1}$', re.MULTILINE)
path_regex.sub(r'\g<1>\g<2>', input_string)
This will work on strings with single line paths but not paths that contain newlines.
What should I do?
^([\s\S]*?\/)(\w+\.c)
Try this.See demo.This will work multiline
too.Use m
or multiline
flag.
https://regex101.com/r/rX1tE6/7
Try this regex: (?:.*\/)(.+)\.(.+)
Use \1 to access filename and \2 to access extension
DEMO
You may try this,
>>> s = '''a/b/c/d/file1.c
a/b/c/d/e/f/g/h/1/2/3/4/5/foo.c
dir1/dir2/newlinedir
/nextlinedir/bar.c'''
>>> print(re.sub(r'(?s).*?([^/]+\.c)', r'\1\n', s))
file1.c
foo.c
bar.c
or
>>> print(re.sub(r'(?s).*?([^/]+)(\.[^.\n]+)(?=$|\n)', r'\1\2\n', s))
file1.c
foo.c
bar.c
This simple regex also works and you can access the filename with extension using \1
([^/]*\.\w+)
This is technically not what you are asking for, but maybe regex here is not the right tool, since now you have two problems.
I think this is what you are searching for:
pydoc os.path.basename
So try with this:
map(os.path.basename, text.split('\n'))