I have a .txt file containing 4-digit numbers.
sometimes they only contain one 4-digit number, sometimes multiple 4-digit numbers, sometimes they are empty.
example1.txt file:
6304
6204
example2.txt file:
6308
example3.txt file:
6305
example4.txt file:
6300
6204
6301
example5.txt file:
6302
6234
6345
What I need to do, is to check if the numbers inside the example file are in a list of numbers I have in an other textfile.
this list looks something like this: (but with more numbers)
6300
6301
6302
6303
6304
6305
*for the 'example1.txt' file:
the number '6204' should be deleted out of the file*(because it's not in the list.)*
the number '6304' must stay in the example file (it is in the list)
*for the 'example2.txt' file:
the number should be deleted and the file should be empty.
*for the 'example3.txt' file:
the number stays in the example file.
*for the 'example4.txt' file:
There is more than 1 match in the example file. so everything should be deleted.
*for the 'example5.txt' file:
Only 6302 should be in the file. the other two should be deleted because they are not in the list.
So basicly I want to keep the files that have 1 single match. and those files should only contain the number that matches a number in the list. If there is more than 1 match, the file should be empty. if there are no matches the file should also be empty
On top of all this, I would like to do it in a sh script.
Now my question is:
Is this even possible and how? or do I need to work with a database and other programming language ?
Thanks in advance.
I think I have understood your logic now. I assume your list is stored in file list.txt
and that you save the following as marksscript
:
#!/bin/bash
#
# First count total number of matches and store in variable MATCHES
#
MATCHES=0
while read WORD
do
# Count number of matches for this word
N=$(grep -c $WORD list.txt)
[ $N -eq 1 ] && MATCHEDWORD=$WORD
echo DEBUG: $WORD $N
((MATCHES+=N))
done < "$1"
#
# Now we know total number of matches, decide what to do
#
echo DEBUG: Total matches $MATCHES
if [ $MATCHES -ne 1 ]; then
echo DEBUG: Zero out file - not exactly ONE match
> "$1"
else
echo DEBUG: $MATCHEDWORD remains as singleton match
echo $MATCHEDWORD > "$1"
fi
Run like this:
chmod +x marksscript
./marksscript example1.txt
OUTPUT
./go example1
DEBUG: 6204 0
DEBUG: 6304 1
DEBUG: Total matches 1
DEBUG: 6304 remains as singleton match
./go example2
DEBUG: Total matches 0
DEBUG: Zero out file - not exactly ONE match
./go example3
DEBUG: 6305 1
DEBUG: Total matches 1
DEBUG: 6305 remains as singleton match
./go example4
DEBUG: 6300 1
DEBUG: 6204 0
DEBUG: 6301 1
DEBUG: Total matches 2
DEBUG: Zero out file - not exactly ONE one match
This is certainly not the fastest solution but works:
while read line
do
sed -i "s/$line//" example1.txt
done < list_textfile.txt
It deletes every appearance of the string in each line from your "numbers to check" text file.
Update:
This did not what was asked: The above filters out the strings in the list_textfile.txt instead of keeping them.
This should do the right thing:
grep -o -f list_textfile.txt example1.txt
- -o makes sure only the matching part is shown in the output
- -f allows to specify a file which contains strings to grep for