Non greedy (reluctant) regex matching in sed?

I'm trying to use sed to clean up lines of URLs to extract just the domain..

So from:

http://www.suepearson.co.uk/product/174/71/3816/

I want:

http://www.suepearson.co.uk/

(either with or without the trainling slash, it doesn't matter)

I have tried:

 sed 's|\(http:\/\/.*?\/\).*|\1|'

and (escaping the non greedy quantifier)

sed 's|\(http:\/\/.*\?\/\).*|\1|'

but I can not seem to get the non greedy quantifier to work, so it always ends up matching the whole string.

标签： regex sed pcre greedy regex-greedy

20条回答

宁负流年不负卿

2楼-- · 2018-12-31 05:19

another way, not using regex, is to use fields/delimiter method eg

string="http://www.suepearson.co.uk/product/174/71/3816/"
echo $string | awk -F"/" '{print $1,$2,$3}' OFS="/"

0人赞添加讨论(0) 举报

临风纵饮

3楼-- · 2018-12-31 05:20

sed 's|(http:\/\/[^\/]+\/).*|\1|'

0人赞添加讨论(0) 举报

旧时光的记忆

4楼-- · 2018-12-31 05:21

sed certainly has its place but this not not one of them !

As Dee has pointed out: Just use cut. It is far simpler and much more safe in this case. Here's an example where we extract various components from the URL using Bash syntax:

url="http://www.suepearson.co.uk/product/174/71/3816/"

protocol=$(echo "$url" | cut -d':' -f1)
host=$(echo "$url" | cut -d'/' -f3)
urlhost=$(echo "$url" | cut -d'/' -f1-3)
urlpath=$(echo "$url" | cut -d'/' -f4-)

gives you:

protocol = "http"
host = "www.suepearson.co.uk"
urlhost = "http://www.suepearson.co.uk"
urlpath = "product/174/71/3816/"

As you can see this is a lot more flexible approach.

(all credit to Dee)

0人赞添加讨论(0) 举报

伤终究还是伤i

5楼-- · 2018-12-31 05:22

sed 's|\(http:\/\/www\.[a-z.0-9]*\/\).*|\1| works too

0人赞添加讨论(0) 举报

其实，你不懂

6楼-- · 2018-12-31 05:23

Try [^/]* instead of .*?:

sed 's|\(http://[^/]*/\).*|\1|g'

0人赞添加讨论(0) 举报

明月照影归

7楼-- · 2018-12-31 05:23

Here is something you can do with a two step approach and awk:

A=http://www.suepearson.co.uk/product/174/71/3816/  
echo $A|awk '  
{  
  var=gensub(///,"||",3,$0) ;  
  sub(/\|\|.*/,"",var);  
  print var  
}'

Output: http://www.suepearson.co.uk

Hope that helps!

0人赞添加讨论(0) 举报

1 2 3 4 下一页

Non greedy (reluctant) regex matching in sed?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间