Need to extract a block of text between two blank

2019-06-05 00:04发布

问题:

I have been in the slow and steady process of developing a bash script that can quickly fetch some basic DNS information about a domain. (Think like LeafDNS or IntoDNS but that I can quickly run from CLI.) Today, a coworker gave me the final piece that I needed to complete the script, which is how to fetch the nameservers (and their IPs) that a domain is pointed to, as reported by the domain's registrar vís-a-vís dig +trace +additional.

The problem, however, is that dig +trace +additional returns a lot of extra information that I neither want nor need. Of the four blocks of text (separated by blank lines) returned, I only need the third one (the first two are the root nameservers, and the TLD's parent nameservers, and the fourth block is the nameservers as reported in the DNS zone). Ideally, I would also like to omit the comment that dig appends to the end of the third block of text as well, to only have the nameservers and their IPs.

I did find this as a solution by piping the output of dig through sed, but I'm only vaguely familiar with sed. When I copypasta that sed command directly, I get the first and third blocks. Here's an example of the output:

calyodelphi@dragonpad:~ $ dig +trace +additional dragon-architect.com | sed '/^$/,/^$/!d'

; <<>> DiG 9.7.3-P3 <<>> +trace +additional dragon-architect.com
;; global options: +cmd
.           214851  IN  NS  m.root-servers.net.
.           214851  IN  NS  a.root-servers.net.
.           214851  IN  NS  b.root-servers.net.
.           214851  IN  NS  g.root-servers.net.
.           214851  IN  NS  j.root-servers.net.
.           214851  IN  NS  d.root-servers.net.
.           214851  IN  NS  e.root-servers.net.
.           214851  IN  NS  f.root-servers.net.
.           214851  IN  NS  l.root-servers.net.
.           214851  IN  NS  c.root-servers.net.
.           214851  IN  NS  k.root-servers.net.
.           214851  IN  NS  h.root-servers.net.
.           214851  IN  NS  i.root-servers.net.
;; Received 228 bytes from 192.168.16.1#53(192.168.16.1) in 18 ms


dragon-architect.com.   172800  IN  NS  ns1.dragon-architect.com.
dragon-architect.com.   172800  IN  NS  ns2.dragon-architect.com.
ns1.dragon-architect.com. 172800 IN A   70.84.243.130
ns2.dragon-architect.com. 172800 IN A   70.84.243.131
;; Received 106 bytes from 192.33.14.30#53(b.gtld-servers.net) in 165 ms


calyodelphi@dragonpad:~ $ 

I'm pretty much lost at this point and would very much appreciate help. Gratuitous bonus points if it's simple, elegant, highly portable, easy to read, and comes with an explanation of how the sed command works so I can learn off of it. I'm open to using grep or awk as well; whichever will yield the most portable and maintainable results.

EDIT: I do know about several dig arguments (notably +nocomments and +nostats). Unfortunately, they don't work with +trace. So I have to manually remove the stats/comments with sed or awk.

EDIT 2: Also, it didn't occur to me until today that the solutions needed to consider TLDs like .co.uk or .com.au. I ran a dig +trace +additional on a couple of domains like bbc.co.uk and melbourneit.com.au to see if this changed the output, and it did not. Four blocks of output are still returned, meaning that both provided solutions still work exactly as intended.

回答1:

You can try with awk. Set RS to null string to split registers in blank lines and set FS to newlines to split fields of each register with that character. That way I have to choose the third one (FNR == 3), remove last field ($NF) and trailing spaces, and print:

dig +trace +additional dragon-architect.com | awk '
   BEGIN { RS = ""; FS = OFS = "\n" } 
   FNR == 3 { $NF = ""; sub( /[[:space:]]+$/, "" ); print }
'

It yields:

dragon-architect.com.   172800  IN  NS  ns1.dragon-architect.com.
dragon-architect.com.   172800  IN  NS  ns2.dragon-architect.com.
ns1.dragon-architect.com. 172800 IN A   70.84.243.130
ns2.dragon-architect.com. 172800 IN A   70.84.243.131


回答2:

By setting record selector to \n\n it divide this to 4 blocs, then print block 3. PS this may only work with gawk and other awk that supports more than one character in RS.

dig +trace +additional dragon-architect.com | awk 'NR==3' RS="\n\n"
dragon-architect.com.   172800  IN      NS      ns1.dragon-architect.com.
dragon-architect.com.   172800  IN      NS      ns2.dragon-architect.com.
ns1.dragon-architect.com. 172800 IN     A       70.84.243.130
ns2.dragon-architect.com. 172800 IN     A       70.84.243.131
;; Received 106 bytes from 192.12.94.30#53(192.12.94.30) in 60 ms

You can even remove the single quotes. But its best to leave them there.

awk NR==3 RS="\n\n"


标签: bash shell sed dns