Remove “http://” and “https://” from a string

2020-07-06 06:47发布

问题:

am new to ruby using regular expression .how can i remove https and http and www from a string

server= http://france24.miles.com
server= https://seloger.com

from these sites i want to remove all http ,https and www

france24.miles.com
seloger.com

i used following code but it is not woking for me

server = server.(/^https?\:\/\/(www.)?/,'')

回答1:

server = server.(/^https?\:\/\/(www.)?/,'')

This didn't work, because you aren't calling a method of the string server. Make sure you call the sub method:

server = server.sub(/^https?\:\/\/(www.)?/,'')

Example

> server = "http://www.stackoverflow.com"
> server = server.sub(/^https?\:\/\/(www.)?/,'')
stackoverflow.com

As per the requirement if you want it to work with the illegal format http:\\ as well, use the following regex:

server.sub(/https?\:(\\\\|\/\/)(www.)?/,'')


回答2:

Std-lib URI is dedicated for such kind of work. Using this would be simpler and may be more reliable

require 'uri'

uri = URI.parse("http://www.ruby-lang.org/")

uri.host
=> "www.ruby-lang.org"

uri.host.sub(/\Awww\./, '')
=> "ruby-lang.org"


回答3:

See the String#sub(...) method.

Also, consider using the %r{...} literal notation for Regexp objects so that forward-slashes (/) are easier to recognize:

def trim_url(str)
  str.sub %r{^https?:(//|\\\\)(www\.)?}i, ''
end

trim_url 'https://www.foo.com' # => "foo.com"
trim_url 'http://www.foo.com'  # => "foo.com"
trim_url 'http://foo.com'      # => "foo.com"
trim_url 'http://foo.com'      # => "foo.com"

Here is what each part of the regular expression means:

%r{^https?:(//|\\\\)(www\.)?}
#  │├──┘├┘│├───────┘ ├─┘├┘ └── everything in the group (...), or nothing.
#  ││   │ ││         │  └── the period character "."
#  ││   │ ││         └── the letters "www".
#  ││   │ │└── the characters "//" or "\\".
#  ││   │ └── the colon character ":".
#  ││   └── the letter "s", or nothing.
#  │└── the letters "http".
#  └── the beginning of the line.


回答4:

def strip_url(url)
    return url.to_s.sub!(/https?(\:)?(\/)?(\/)?(www\.)?/, '') if url.include?("http")
    url.to_s.sub!(/(www\.)?/,'') if url.include?("www")
  end

This will change in place the provided url, stripped of any leading http(s) or www. It covers the following formats:

  • http://www.example.com
  • http:/www.example.com
  • http:www.example.com
  • https://www.example.com
  • https:/www.example.com
  • https:www.example.com
  • http://example.com
  • http:/example.com
  • http:example.com
  • https://example.com
  • https:/example.com
  • https:example.com
  • www.example.com
  • example.com

You'll end up with example.com using this method.



回答5:

With this regex: server\s*=\s*\Khttps?://(?:www\.)?

In Ruby 2.0+

result = subject.gsub(/server\s*=\s*\Khttps?:\/\/(?:www\.)?/, '')

In the demo, see the replacements at the bottom.

Hang tight for explanation. :)

Explanation

  • server\s*=\s* matches server= with optional spaces, to make sure we are looking at the right strings
  • The \K tells the engine to drop what was matched so far from the final match
  • https? matches http with an optional s
  • :// matches these literal characters
  • (?:www\.)? matches an optional www.
  • we replace the match with an empty string

Earlier Versions of Ruby

\K is only supported from Ruby 2.0+. Earlier versions have to use a lookbehind:

result = subject.gsub(/(?:(?<=server=)|(?<=server= ))https?:\/\/(?:www\.)?/, '')


标签: ruby regex