For example:
string = "This is a link http://www.google.com"
How could I extract 'http://www.google.com' ?
(Each link will be of the same format i.e 'http://')
For example:
string = "This is a link http://www.google.com"
How could I extract 'http://www.google.com' ?
(Each link will be of the same format i.e 'http://')
There is another way how to extract URLs from text easily. You can use urlextract to do it for you, just install it via pip:
and then you can use it like this:
You can find more info on my github page: https://github.com/lipoja/URLExtract
NOTE: It downloads a list of TLDs from iana.org to keep you up to date. But if the program does not have internet access then it's not for you.
In order to find a web URL in a generic string, you can use a regular expression (regex).
A simple regex for URL matching like the following should fit your case.
If you want to be even more precise, in the TLD section, you should ensure that the TLD is a valid TLD (see the entire list of valid TLDs here: https://data.iana.org/TLD/tlds-alpha-by-domain.txt):
Then, you can simply compile the former regex and use it to find possible matches:
Which, in case of the string "This is a link http://www.google.com" will output:
If you change the input with a more complex URL, for example "This is also a URL https://www.host.domain.com:80/path/page.php?query=value&a2=v2#foo but this is not anymore" the output will be:
NOTE: If you are looking for more URLs in a single string, you can still use the same regex, but just use findall() instead of search().
There may be few ways to do this but the cleanest would be to use regex
If there can be multiple links you can use something similar to below
This extracts all urls with parameters, somehow all above examples haven't worked for me