I have given String which contains any valid url. I have to find only name of website from given url. I have also ignore sub domains.
like
http://www.yahoo.com => yahoo
www.google.co.in => google
http://in.com => in
http://india.gov.in/ => india
https://in.yahoo.com/ => yahoo
http://philotheoristic.tumblr.com/ =>tumblr
http://philotheoristic.tumblr.com/
https://in.movies.yahoo.com/ =>yahoo
How to do this
Regular expressions may help you:
A regular expression is a way to represent a set of strings. This set is composed by any string matching the expression. In the code above, the string used as
split
argument is the regular expression that matches: Any "." followed by an alphanumeric text OR "//" followed by an alphanumeric text. So these "." and "//" substrings are the separators used to split the string in parts, being the first one the site name.In "www.google.co.in", the string would be splited this way:
goole, co, in
. Since the solution is using the first element of the spit array, the result is:google
.There is no any possible way to find out valid website name from url. But if you are trying to cut a particular part of url string, you can do this by string operation as follows
Yo can make use of
URL
From Documentation - http://docs.oracle.com/javase/tutorial/networking/urls/urlInfo.html
Here is the output displayed by the program:
So by using
aURL.getHost()
you can get website name. To ignore sub domains you can split it with"."
Therefore it becomesaURL.getHost().split(".")[0]
to get only name.I found similar contents. although some different.
here is the code
Making use of this class is simple:
here is the link:
http://www.gotoquiz.com/web-coding/programming/java-programming/how-to-extract-titles-from-web-pages-in-java/
I hope it is help you.