可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to be able to parse any url with ruby to get the main part of the domain without the www (just the XXXX.com)

回答1:

This should work with pretty much any URL:

# URL always gets parsed twice
def get_host_without_www(url)
  url = "http://#{url}" if URI.parse(url).scheme.nil?
  host = URI.parse(url).host.downcase
  host.start_with?('www.') ? host[4..-1] : host
end

Or:

# Only parses twice if url doesn't start with a scheme
def get_host_without_www(url)
  uri = URI.parse(url)
  uri = URI.parse("http://#{url}") if uri.scheme.nil?
  host = uri.host.downcase
  host.start_with?('www.') ? host[4..-1] : host
end

You may have to require 'uri'.

回答2:

Please note there is no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list of all top-level domains and the level at which domains can be registered.

This is the reason why the Public Suffix List exists.

I'm the author of PublicSuffix, a Ruby library that decomposes a domain into the different parts.

Here's an example

require 'uri/http'

uri = URI.parse("http://toolbar.google.com")
domain = PublicSuffix.parse(uri.host)
# => "toolbar.google.com"
domain.domain
# => "google.com"

uri = URI.parse("http://www.google.co.uk")
domain = PublicSuffix.parse(uri.host)
# => "www.google.co.uk"
domain.domain
# => "google.co.uk"

回答3:

Just a short note: to overcome the second parsing of the url from Mischas second example, you could make a string comparison instead of URI.parse.

# Only parses once
def get_host_without_www(url)
  url = "http://#{url}" unless url.start_with?('http')
  uri = URI.parse(url)
  host = uri.host.downcase
  host.start_with?('www.') ? host[4..-1] : host
end

The downside of this approach is, that it is limiting the url to http(s) based urls, which is widely the standard. But if you will use it more general (f.e. for ftp links) you have to adjust accordingly.

回答4:

Addressable is probably the right answer in 2018, especially uses the PublicSuffix gem to parse domains.

However, I need to do this kind of parsing in multiple places, from various data sources, and found it a bit verbose to use repeatedly. So I created a wrapper around it, Adomain:

require 'adomain'

Adomain["https://toolbar.google.com"]
# => "toolbar.google.com"

Adomain["https://www.google.com"]
# => "google.com"

Adomain["stackoverflow.com"]
# => "stackoverflow.com"

I hope this helps others.

回答5:

Here's one that works better with .co.uk and .com.fr - type domains

domain = uri.host[/[^.\s\/]+\.([a-z]{3,}|([a-z]{2}|com)\.[a-z]{2})$/]

回答6:

if the URL is in format http://www.google.com, then you could do something like:

a = 'http://www.google.com'
puts a.split(/\./)[1] + '.' + a.split(/\./)[2]

a =~ /http:\/\/www\.(.*?)$/
puts $1

How would you parse a url in Ruby to get the main

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

收藏的人(0)

How would you parse a url in Ruby to get the main

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮