Shell scripting to find the delimiter

2019-10-05 05:05发布

I have a file with three columns, which has pipe as a delimiter. Now some lines in the file can have a "," instead of "|", due to some error. I want to output all such erroneous rows.

标签: bash shell awk
2条回答
Melony?
2楼-- · 2019-10-05 05:47

You can also use grep, it is more complicated:

egrep "\|.*\|.*\|" input
echo No pipe
egrep "^[^\|]*$" input
echo One pipe
egrep "^[^\|]*\|[^\|\]*$" input
echo 3+ pipe
egrep "\|[^\|]*\|[^\|\]*\|" input

Before combining the greps, first introduce new variables p (pipe) and n (no pipe)

p="\|"
n="[^\|]*"
echo "p=$p, n=$n"
echo No pipe
egrep "^$n$" input
echo One pipe
egrep "^$n$p$n$" input
echo 3+ pipe
egrep "$p$n$p$n$p" input

Now bring all together

egrep "^$n$|^$n$p$n$|$p$n$p$n$p" input

Edit: The comments and variable names were about "slashes", but they are pipes (with backslashes). That was a bit confusing.

查看更多
We Are One
3楼-- · 2019-10-05 05:58

To count the number of columns with awk you can use the NF variable:

$ cat file
ABC|12345|EAR
PQRST|123|TWOEYES
ssdf|fdas,sdfsf
$ awk -F\| 'NF!=3' file
ssdf|fdas,sdfsf

However, this does not seem to cover all the possible ways the data could be corrupted based on the various revisions of the question and the comments.

A better approach would be to define the exact format that the data must follow. For example, assuming that a line is "correct" if it is three columns, with the first and third letters only, and the second numeric, you could write the following script to match all non conforming lines:

awk -F\| '!(NF==3 && $1$3 ~ /^[a-zA-Z]+$/ && $2+0==$2)' file

Test (notice that only the second line (which is conforming) does not get printed):

$ cat file
A,BC|12345|EAR
PQRST|123|TWOEYES
ssdf|fdas,sdfsf
ABC|3983|MAKE,
sf dl lfsdklf |kldsamfklmadkfmask |mfkmadskfmdslafmka
ABC|abs|EWE
sdf|123|123
$ awk -F\| '!(NF==3&&$1$3~/^[a-zA-Z]+$/&&$2+0==$2)' file
A,BC|12345|EAR
ssdf|fdas,sdfsf
ABC|3983|MAKE,
sf dl lfsdklf |kldsamfklmadkfmask |mfkmadskfmdslafmka
ABC|abs|EWE
sdf|123|12

You can adapt the above command to your specific needs, based on what you think is a valid input. For example, if you wanted to also restrict the length of each line to 50 characters, you could do

awk -F\| '!(NF==3 && $1$3 ~ /^[a-zA-Z]+$/ && $2+0==$2 && length($0)<50)' file
查看更多
登录 后发表回答