I have the following input file that you might recognize as a debian Packages
file:
Package: nimbox-apexer-sales
Version: 1.0.0-201007241449
Architecture: i386
Maintainer: Ricardo Marimon <rmarimon@nimbox.com>
Installed-Size: 124
Depends: nimbox-apexer-root
Filename: binary/nimbox-apexer-sales_1.0.0-201007241449_i386.deb
Size: 68880
MD5sum: c4538f2913d76b57110ba73d0b87cc16
Section: base
Priority: optional
Description: Sales Application for NiMbox.
Package: nimbox-tomcat
Version: 6.0.26-5
Architecture: i386
Maintainer: Ricardo Marimon <rmarimon@nimbox.com>
Installed-Size: 6144
Depends: sun-java6-jdk
Filename: binary/nimbox-tomcat_6.0.26-5_i386.deb
Size: 5490024
MD5sum: 5f2ccbe6137af2842e1c81bc217444e3
Section: base
Priority: optional
Description: Tomcat Servlet Application Server for NiMbox
NiMbox requires a servlet application server in order to work. The current
NiMbox implementation requires a Tomcat Servlet Application.
The file actually has many of these entries and I want to get the following file
nimbox-apexer-sales 1.0.0-201007241449
nimbox-tomcat 6.0.26-5
Where the Package
and the Version
are separated by a tab
so that I can later use cut
to get them. I'm pretty sure this can be done with sed
. I went over the sed one liners but this is probably a bit more complex. Any ideas?
When working with Debian Packages files, you might find grep-dctrl
useful. It's incredibly flexible in both the ways it allows to limit the data it
outputs, as well as in how to output it. Instead of trying to parse the Packages
file format myself, I'd just ask grep-dctrl
to do it for me, and print only
the bits if information I'm actually interested in:
$ grep-dctrl -n -s Package,Version nimbox /var/lib/apt/lists/..._Packages
That would give you something like:
nimbox-apexer-sales
1.0.0-201007241449
nimbox-tomcat
6.0.26-5
With that, it's only a matter of joining the right lines together, which is easy
enough with, for example, perl:
$ ... |perl -pi -0e's/(?<!^)\n(?!\n)/ /mg; s/\n\n/\n/g'
nimbox-apexer-sales 1.0.0-201007241449
nimbox-tomcat 6.0.26-5
or any set of other standard UNIX tools you happen to like.
It's certainly possible to go directly from the Packages file format to what you
want, but using tools specialized for the job seems like a good idea to me.
Assuming that your file name is test.txt:
grep -P '^Package: |^Version:' test.txt | awk '{ print $2 }' | sed -e 'N;s/\n/ /'
Where:
- grep -P '^Package: |^Version:' -
greps for lines beginning with 'Package: ' or 'Version: '
- awk '{ print $2 }' - strips
'Package: ' and 'Version: '
substrings from the result
- sed -e 'N;s/\n/ /' - joins every
other line
Pure sed solution (using FreeBSD sed on Mac OS X):
# See:
# http://sed.sourceforge.net/sedfaq3.html#s3.3: ... (6) Relentless ...
# http://sed.sourceforge.net/sed1line.txt: ... # if a line begins with ...
sed -n '/^Package:/{
:a
N
/\nVersion:/!ba
p
}' file |
sed -E -e :a -e $'$!N;s/\\nVersion: */\t/;ta' -e 'P;D' |
sed -e 's/^Package: *//'
Here is a sed version:
sed -ne 's/Package: \(.*\)/\1/p'
-ne 's/Version: \(.*\)/\1/p' < filename
| sed 'N;s/\n/ /g'
Using RPMs, the solution would have been:
rpm -qa --queryformat "%{NAME}\t%{VERSION}\n"
Too bad for the sed challenge.
This might work for you:
sed '/Package:/!d;N;s/^[^ ]* //mg;y/\n/\t/' filename
nimbox-apexer-sales 1.0.0-201007241449
nimbox-tomcat 6.0.26-5
Also if you notice the same information can be gathered from the Filename:
line:
sed '/Filename:/!d;s,.*/\([^_]*\)_\([^_]*\).*,\1\t\2,' filename
nimbox-apexer-sales 1.0.0-201007241449
nimbox-tomcat 6.0.26-5
This might be GNU sed specific!