Remove subdomain from string in ruby-第2页回答

Remove subdomain from string in ruby

2019-01-24 12:09发布

I'm looping over a series of URLs and want to clean them up. I have the following code:

# Parse url to remove http, path and check format
o_url = URI.parse(node.attributes['href'])

# Remove www
new_url = o_url.host.gsub('www.', '').strip

How can I extend this to remove the subdomains that exist in some URLs?

标签： ruby url dns subdomain uri

8条回答

Explosion°爆炸

2楼-- · 2019-01-24 12:47

Something like:

def remove_subdomain(host)
    # Not complete. Add all root domain to regexp
    host.sub(/.*?([^.]+(\.com|\.co\.uk|\.uk|\.nl))$/, "\\1")
end

puts remove_subdomain("www.example.com") # -> example.com
puts remove_subdomain("www.company.co.uk") # -> company.co.uk
puts remove_subdomain("www.sub.domain.nl") # -> domain.nl

You still need to add all (root) domains you consider root domain. So '.uk' might be the root domain, but you probably want to keep the host just before the '.co.uk' part.

0人赞添加讨论(0) 举报

\"骚年 ilove

3楼-- · 2019-01-24 12:54

I just wrote a library to do this called Domainatrix. You can find it here: http://github.com/pauldix/domainatrix

require 'rubygems'
require 'domainatrix'

url = Domainatrix.parse("http://www.pauldix.net")
url.public_suffix       # => "net"
url.domain    # => "pauldix"
url.canonical # => "net.pauldix"

url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix       # => "co.uk"
url.domain    # => "pauldix"
url.subdomain # => "foo.bar"
url.path      # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"

0人赞添加讨论(0) 举报

上一页 1 2

Remove subdomain from string in ruby

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间