I am querying London postcode data from geonames:
http://www.geonames.org/postalcode-search.html?q=london&country=GB
I want to turn the output into a list of just the postcode identifiers (Bethnal Green, Islington, etc.). What is the best way to extract just the names in bash?
I see the site offers (but not for free) web services with XML or JSON data... It would be the best way, since the HTML page is not meant to be parsed (easily).
Anyway, nothing is impossible, nonetheless using strictly only bash commands would be a lot hard, if not impossible; often several other common tools are piped in order to achieve the result. But then, sometimes it turns to be more conveniente to stick to a single tool like e.g. Perl, instead of combining cat, grep, awk, sed and whatever else.
Something like
worked extracting 200 lines, assuming a specific format for the code.
ADD
If there's no limit to the software you can use to parse the data, then you could use a line like
which extracts places and postal codes seperated by a
;
like in a CSV, as well as awk:which is based on the answer by Peter.O and extracts the same data... and so on. But in these cases, since you are not limited to the minimal tools found on most Unix or GNU systems, I would stick to one single widespread tool, e.g. perl.
If you have access to the
mojo
tool from the Mojolicious project this all becomes quite a lot easier:The
grep
at the end is just to filter out some junk results; almost (but not quite) every other line is bad, because the page structure is slightly inconsistent. Otherwise you could saytr:nth-child(even)
and get nice results.I'm not sure if you mean this
\n
delimited list (or one in brackets and comma delimited)w3m
is a: "WWW browsable pager with excellent tables/frames support"output (first 10 lines)