I have a file and it's content is a list of some URLs,
I want to extract the domain names from this list of URLs in bash
Example:
sub1.domain.com
domain3.com
sub5.domain.ext
subof.subdomain.domainx.ex2
I want to extract just domain names from this list
How can I do this?
Thank you
You can use grep
:
grep -Eo '[^.]+\.[^.]+$' file.txt
Example:
$ cat file.txt
sub1.domain.com
sub2.domains2.com
domain3.com
sub5.domain.ext
subof.subdomain.domainx.ex2
$ grep -Eo '[^.]+\.[^.]+$' file.txt
domain.com
domains2.com
domain3.com
domain.ext
domainx.ex2
Note that this will return co.uk
for www.google.co.uk
.
A possible solution using Perl:
use Domain::PublicSuffix qw( );
my $dps = Domain::PublicSuffix->new();
for my $host (qw(
www.google.com
foo.bar.google.com
www.google.co.uk
foo.bar.google.co.uk
)) {
my $root = $dps->get_root_domain($host)
or die $dps->error();
say $root;
}
Output:
google.com
google.com
google.co.uk
google.co.uk