Comparing numbers in a text file to a list of numb

2019-08-30 05:15发布

问题:

I have a .txt file containing 4-digit numbers.

sometimes they only contain one 4-digit number, sometimes multiple 4-digit numbers, sometimes they are empty.

example1.txt file:

6304
6204

example2.txt file:

6308

example3.txt file:

6305

example4.txt file:

6300
6204
6301

example5.txt file:

6302
6234
6345

What I need to do, is to check if the numbers inside the example file are in a list of numbers I have in an other textfile.

this list looks something like this: (but with more numbers)

6300 
6301 
6302 
6303 
6304 
6305

*for the 'example1.txt' file:

the number '6204' should be deleted out of the file*(because it's not in the list.)* the number '6304' must stay in the example file (it is in the list)

*for the 'example2.txt' file:

the number should be deleted and the file should be empty.

*for the 'example3.txt' file:

the number stays in the example file.

*for the 'example4.txt' file:

There is more than 1 match in the example file. so everything should be deleted.

*for the 'example5.txt' file:

Only 6302 should be in the file. the other two should be deleted because they are not in the list.


So basicly I want to keep the files that have 1 single match. and those files should only contain the number that matches a number in the list. If there is more than 1 match, the file should be empty. if there are no matches the file should also be empty

On top of all this, I would like to do it in a sh script.

Now my question is:

Is this even possible and how? or do I need to work with a database and other programming language ?

Thanks in advance.

回答1:

I think I have understood your logic now. I assume your list is stored in file list.txt and that you save the following as marksscript:

#!/bin/bash
#
# First count total number of matches and store in variable MATCHES
#
MATCHES=0
while read WORD
do
   # Count number of matches for this word
   N=$(grep -c $WORD list.txt)
   [ $N -eq 1 ] && MATCHEDWORD=$WORD
   echo DEBUG: $WORD $N
   ((MATCHES+=N))
done < "$1"

#
# Now we know total number of matches, decide what to do
#
echo DEBUG: Total matches $MATCHES

if [ $MATCHES -ne 1 ]; then
    echo DEBUG: Zero out file - not exactly ONE match
    > "$1"
else
    echo DEBUG: $MATCHEDWORD remains as singleton match
    echo $MATCHEDWORD > "$1"
fi

Run like this:

chmod +x marksscript
./marksscript example1.txt

OUTPUT

./go example1
DEBUG: 6204 0
DEBUG: 6304 1
DEBUG: Total matches 1
DEBUG: 6304 remains as singleton match

./go example2
DEBUG: Total matches 0
DEBUG: Zero out file - not exactly ONE match

./go example3
DEBUG: 6305 1
DEBUG: Total matches 1
DEBUG: 6305 remains as singleton match

./go example4
DEBUG: 6300 1
DEBUG: 6204 0
DEBUG: 6301 1
DEBUG: Total matches 2
DEBUG: Zero out file - not exactly ONE one match


回答2:

This is certainly not the fastest solution but works:

while read line
do 
    sed -i "s/$line//" example1.txt
done < list_textfile.txt

It deletes every appearance of the string in each line from your "numbers to check" text file.

Update: This did not what was asked: The above filters out the strings in the list_textfile.txt instead of keeping them.

This should do the right thing:

grep -o -f list_textfile.txt example1.txt
  • -o makes sure only the matching part is shown in the output
  • -f allows to specify a file which contains strings to grep for