The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.
So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html"
I would want to remove everything before "interpreter.html"
.
Is there a function, library, or regex I could use to make this happen? I've looked at other Stack Overflow posts but the solutions don't seem to work.
These are two of my several attempts:
for link in link_list:
file_names.append(link.replace('/[^/]*$',''))
print(file_names)
&
for link in link_list:
file_names.append(link.rpartition('//')[-1])
print(file_names)
That doesn't need regex.
You can use rpartition():
And take the last part of the 3 element tuple that is returned:
Here's a more general, regex way of doing this:
This should work if you plan to use regex
Just use string.split:
split gives you an array of strings that were separated by "/". The [-1] gives you the last element in the array, which is what you want.
Have a look at
str.rsplit
.And to use RegEx
Then match the 2nd group which lies between the last
/
and the end of String. This is a greedy usage of the greedy technique in RegEx.Debuggex Demo
Small Note - The problem with
link.rpartition('//')[-1]
in your code is that you are trying to match//
and not/
. So remove the extra/
as inlink.rpartition('/')[-1]
.