I have a file which contains words
abciuf.com abdbhj.co.in abcshjkl.org.in.2 abciuf zasdg cbhjk asjk
including other contents. The word starts with abci, abdb, abcs, abai
is my requirement. So I want only the word which print starts with abci, abdb, abcs, abai
like - abciuf.com abdbhj.co.in abcshjkl.org.in.2 abciuf Azerbaijan
I have tried via grep command but it doesn't help me
cat /etc/xyz.txt|egrep -o "abdb*|abci*|abcs*|abai*"
cat /etc/xyz.txt|egrep -Eow "abdb*|abci*|abcs*|abai*"
grep -Eo `\<(abdb|abci|abcs|abai)\S*' </etc/xyz.txt
\<
(or \b
) matches start of "word" (or a "word" boundary)
(A|B)
matches A or B
\S*
matches zero or more nonspace characters (until a non-nonspace character)
it was a good idea to try using grep's -w
option but its definition of "word" is too strict (matching stops if it encounters .
)
- shell meaning of
*
is not same as grep's
- you can make the regexp shorter but it becomes harder to read
You can try Perl also
perl -ne ' while(/(\b(abdb|abci|abcs|abai)\S+)/g) { print "$1 \n" } '
with your inputs
$ cat sin15.txt
abciuf.com abdbhj.co.in abcshjkl.org.in.2 abciuf zasdg cbhjk asjk
$ perl -ne ' while(/(\b(abdb|abci|abcs|abai)\S+)/g) { print "$1 \n" } ' sin15.txt
abciuf.com
abdbhj.co.in
abcshjkl.org.in.2
abciuf
$
With GNU awk for multi-char RS and RT:
$ awk -v RS='\\<(abdb|abci|abcs|abai)\\S*' 'RT{print RT}' file
abciuf.com
abdbhj.co.in
abcshjkl.org.in.2
abciuf