Explaining sed, grep and cut syntax

2020-07-26 10:34发布

问题:

I am analyzing a batch file and there is a line that it edit a text file (input) and making a txt file (output).

The batch is using three helping tools.exe: grep, sed and cut. I tried to read their manual use but it wasn't easy.

The line is:

type input.txt | sed "s#""#'#g" | grep -o "class='name[^>]*" | sed -n "/id=/p" | grep -o "surname=[^>]*" | cut -d"'" -f2 >output.txt

I want to know how the line is interpreted? What are the rules? Is there a smarter way of doing this (for example using one tool instead of all three)?

回答1:

I'll add to jeb's answer, although it covers most of what you asked.
These three commands are emulated commands ported from Linux, and they do the following:

  1. sed: a stream editor for filtering and transforming text.
  2. grep: a tool for printing lines matching a pattern.
  3. cut: a tool for cutting out selected portions of each line of a file.

I recommend that you read more about these three commands by either typing man <command name> in Linux, or Googling that same string (for instance, "man grep").
Also, look up regular expressions. Though they are usually unclear for beginners, they are a common and compact way for representing patterns.

Regarding the specific usage in the question:

sed "s#""#'#g"

For each line, this replaces any quotation marks ("") with an apostrophes (').

grep -o "class='name[^>]*"

This prints only the part of the line starting with class='name but without a following >.

sed -n "/id=/p"

By default Sed prints every line. On the other hand, sed -n "<some pattern> /p" prints only the lines that match the specified pattern. In this case, Sed prints only the lines containing id=.

grep -o "surname=[^>]*"

This prints only the part of the line that starts with surname=name' but without a following >.

cut -d"'" -f2

This parses each line as successive fields separated by an apostrophe ('), and picks the second one.

Everything is piped, meaning that the output of the each command serves as input for the next command to the right. The contents of "input.txt" are fed into the Sed command, the output of which is then fed into the grep command, and so on. The final output is obviously printed into a new file named "output.txt".

And yes, like jeb mentioned, this looks like an awkward solution, because everything here can be done sed alone, presumably by only one or two commands.



回答2:

It's more or less easy.

Splitting it to single commands:

sed "s#""#'#g" is equivalent to sed "s/""/'/g", which will replace each quote with a ' character.

grep -o "class='name[^>]*" will catch only the lines with the text class='name and the -o switch should prefix the output with STDIN: ( don't know why this should be useful).

sed -n "/id=/p" will catch only lines containing the text id=.

grep -o "surname=[^>]*" will catch only the lines with the text surname=.

cut -d"'" -f2 will cut the line into parts, The parts are seperated by ' (-d') and you get the second field (-f2)

Yes, this looks like a fast hack solution, this could be solved with sed alone.
Especially when the order of the single texts are in a fixed order, like:
<class="name17" id=13> <surname=Frank>



回答3:

The | character is the pipe character. It is used to pipe the output of one command to the input of another.

The > character is the redirect character. It redirects the standard output to a file.

So in your example the process starts with the type command:

type input.txt

This sends the input.txt to standard output which is then piped into the input of the next command:

sed "s#""#'#g"

and so on and so on through the other piped grep and sed commands.

The final cut command uses the > character to redirect it's output to the output.txt file.

cut -d"'" -f2 >output.txt


标签: bash cygwin