问题:

I would like to extract every link that starts with http:// (not sure if I have https:// inside) and ends with .html from a text file using grep command.

Problem that I have is that file is too big and there are a lot of links...

I tried this:

grep "/http:\/\/.*?\.html/"  filename.txt > newFile.txt

but I get an empty file, just like with this:

grep -Eo "(http|https)://[a-zA-Z0-9]./(html)" filename.txt > newFile.txt

Can anyone help me?

Just to be sure that we are on the same track, I want to extract all links to new file and have them 1 per line.

Thank you.

Best regards

回答1:

You can use:

grep -Eo "https?://\S+?\.html" filename.txt > newFile.txt

This will match 1 or more non-space character after https:// and before .html

回答2:

This work for me:

grep -oE '(http|https)://(.*).html' filename.txt > newFile.txt

but, if we have two links in one line we take both this links in one line

http://site1.com/1.html</a>tralala<a href="http://site2.com/2.html

Extract all URLs that start with http or https and

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮