I am analyzing a batch file and there is a line that it edit a text file (input) and making a txt file (output).
The batch is using three helping tools.exe: grep
, sed
and cut
. I tried to read their manual use but it wasn't easy.
The line is:
type input.txt | sed "s#""#'#g" | grep -o "class='name[^>]*" | sed -n "/id=/p" | grep -o "surname=[^>]*" | cut -d"'" -f2 >output.txt
I want to know how the line is interpreted? What are the rules? Is there a smarter way of doing this (for example using one tool instead of all three)?
The | character is the pipe character. It is used to pipe the output of one command to the input of another.
The > character is the redirect character. It redirects the standard output to a file.
So in your example the process starts with the type command:
This sends the input.txt to standard output which is then piped into the input of the next command:
and so on and so on through the other piped grep and sed commands.
The final cut command uses the > character to redirect it's output to the output.txt file.
I'll add to jeb's answer, although it covers most of what you asked.
These three commands are emulated commands ported from Linux, and they do the following:
sed
: a stream editor for filtering and transforming text.grep
: a tool for printing lines matching a pattern.cut
: a tool for cutting out selected portions of each line of a file.I recommend that you read more about these three commands by either typing
man <command name>
in Linux, or Googling that same string (for instance, "man grep").Also, look up regular expressions. Though they are usually unclear for beginners, they are a common and compact way for representing patterns.
Regarding the specific usage in the question:
For each line, this replaces any quotation marks (
""
) with an apostrophes ('
).This prints only the part of the line starting with
class='name
but without a following>
.By default Sed prints every line. On the other hand,
sed -n "<some pattern> /p"
prints only the lines that match the specified pattern. In this case, Sed prints only the lines containingid=
.This prints only the part of the line that starts with
surname=name'
but without a following>
.This parses each line as successive fields separated by an apostrophe (
'
), and picks the second one.Everything is piped, meaning that the output of the each command serves as input for the next command to the right. The contents of "input.txt" are fed into the Sed command, the output of which is then fed into the grep command, and so on. The final output is obviously printed into a new file named "output.txt".
And yes, like jeb mentioned, this looks like an awkward solution, because everything here can be done
sed
alone, presumably by only one or two commands.It's more or less easy.
Splitting it to single commands:
sed "s#""#'#g"
is equivalent tosed "s/""/'/g"
, which will replace each quote with a'
character.grep -o "class='name[^>]*"
will catch only the lines with the textclass='name
and the-o
switch should prefix the output withSTDIN:
( don't know why this should be useful).sed -n "/id=/p"
will catch only lines containing the textid=
.grep -o "surname=[^>]*"
will catch only the lines with the textsurname=
.cut -d"'" -f2
will cut the line into parts, The parts are seperated by'
(-d'
) and you get the second field (-f2
)Yes, this looks like a fast hack solution, this could be solved with sed alone.
Especially when the order of the single texts are in a fixed order, like:
<class="name17" id=13> <surname=Frank>