Say I have a string like this: "http://something.example.com/directory/"
What I want to do is to parse this string, and extract the "something"
from the string.
The first step, is to obviously check to make sure that the string contains "http://"
- otherwise, it should ignore the string.
But, how do I then just extract the "something"
in that string? Assume that all the strings that this will be evaluating will have a similar structure (i.e. I am trying to extract the subdomain of the URL - if the string being examined is indeed a valid URL - where valid is starts with "http://"
).
Thanks.
P.S. I know how to check the first part, i.e. I can just simply split the string at the "http://"
but that doesn't solve the full problem because that will produce "http://something.example.com/directory/"
. All I want is the "something"
, nothing else.
Well, you can use regular expressions. Something like
/http:\/\/([^\.]+)/
, that is, the first group of non '.' letters after http. Check out http://rubular.com/, you can test your regular expressions against a set of tests too, it's great for learning this tool :)You could use URI like
and you could then just work on the host.
Or there is a gem
domainatrix
from Remove subdomain from string in rubyand you could just take the subdomain.
I'd do it this way:
URI is built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIs then look at Addressable::URI.