I would like to extract every link that starts with http:// (not sure if I have https:// inside) and ends with .html from a text file using grep command.
Problem that I have is that file is too big and there are a lot of links...
I tried this:
grep "/http:\/\/.*?\.html/" filename.txt > newFile.txt
but I get an empty file, just like with this:
grep -Eo "(http|https)://[a-zA-Z0-9]./(html)" filename.txt > newFile.txt
Can anyone help me?
Just to be sure that we are on the same track, I want to extract all links to new file and have them 1 per line.
Thank you.
Best regards
You can use:
This will match 1 or more non-space character after
https://
and before.html
This work for me:
but, if we have two links in one line we take both this links in one line