Combine two particular lines using sed

2019-09-15 04:58发布

问题:

I have the following input file that you might recognize as a debian Packages file:

Package: nimbox-apexer-sales
Version: 1.0.0-201007241449
Architecture: i386
Maintainer: Ricardo Marimon <rmarimon@nimbox.com>
Installed-Size: 124
Depends: nimbox-apexer-root
Filename: binary/nimbox-apexer-sales_1.0.0-201007241449_i386.deb
Size: 68880
MD5sum: c4538f2913d76b57110ba73d0b87cc16
Section: base
Priority: optional
Description: Sales Application for NiMbox.

Package: nimbox-tomcat
Version: 6.0.26-5
Architecture: i386
Maintainer: Ricardo Marimon <rmarimon@nimbox.com>
Installed-Size: 6144
Depends: sun-java6-jdk
Filename: binary/nimbox-tomcat_6.0.26-5_i386.deb
Size: 5490024
MD5sum: 5f2ccbe6137af2842e1c81bc217444e3
Section: base
Priority: optional
Description: Tomcat Servlet Application Server for NiMbox
 NiMbox requires a servlet application server in order to work.  The current
 NiMbox implementation requires a Tomcat Servlet Application.

The file actually has many of these entries and I want to get the following file

nimbox-apexer-sales 1.0.0-201007241449
nimbox-tomcat 6.0.26-5

Where the Package and the Version are separated by a tab so that I can later use cut to get them. I'm pretty sure this can be done with sed. I went over the sed one liners but this is probably a bit more complex. Any ideas?

回答1:

When working with Debian Packages files, you might find grep-dctrl useful. It's incredibly flexible in both the ways it allows to limit the data it outputs, as well as in how to output it. Instead of trying to parse the Packages file format myself, I'd just ask grep-dctrl to do it for me, and print only the bits if information I'm actually interested in:

$ grep-dctrl -n -s Package,Version nimbox /var/lib/apt/lists/..._Packages

That would give you something like:

nimbox-apexer-sales
1.0.0-201007241449

nimbox-tomcat
6.0.26-5

With that, it's only a matter of joining the right lines together, which is easy enough with, for example, perl:

$ ... |perl -pi -0e's/(?<!^)\n(?!\n)/ /mg; s/\n\n/\n/g'
nimbox-apexer-sales 1.0.0-201007241449
nimbox-tomcat 6.0.26-5

or any set of other standard UNIX tools you happen to like.

It's certainly possible to go directly from the Packages file format to what you want, but using tools specialized for the job seems like a good idea to me.



回答2:

Assuming that your file name is test.txt:

grep -P '^Package: |^Version:' test.txt  | awk '{ print $2 }' | sed -e 'N;s/\n/ /'

Where:

  1. grep -P '^Package: |^Version:' - greps for lines beginning with 'Package: ' or 'Version: '
  2. awk '{ print $2 }' - strips 'Package: ' and 'Version: ' substrings from the result
  3. sed -e 'N;s/\n/ /' - joins every other line


回答3:

Pure sed solution (using FreeBSD sed on Mac OS X):

# See: 
# http://sed.sourceforge.net/sedfaq3.html#s3.3: ... (6) Relentless ...
# http://sed.sourceforge.net/sed1line.txt: ... # if a line begins with ...

sed -n '/^Package:/{
:a
N
/\nVersion:/!ba
p
}' file |
sed -E -e :a -e $'$!N;s/\\nVersion: */\t/;ta' -e 'P;D' |
sed -e 's/^Package: *//'


回答4:

Here is a sed version:

  sed -ne 's/Package: \(.*\)/\1/p' 
      -ne 's/Version: \(.*\)/\1/p' < filename
      | sed 'N;s/\n/ /g'


回答5:

Using RPMs, the solution would have been:

rpm -qa --queryformat "%{NAME}\t%{VERSION}\n"

Too bad for the sed challenge.



回答6:

This might work for you:

sed '/Package:/!d;N;s/^[^ ]* //mg;y/\n/\t/' filename
nimbox-apexer-sales     1.0.0-201007241449
nimbox-tomcat   6.0.26-5

Also if you notice the same information can be gathered from the Filename: line:

sed '/Filename:/!d;s,.*/\([^_]*\)_\([^_]*\).*,\1\t\2,' filename
nimbox-apexer-sales     1.0.0-201007241449
nimbox-tomcat   6.0.26-5

This might be GNU sed specific!