I am writing a program that opens two files for reading: the first file contains 20 names which I store in an array of the form Names[0] = John\0
. The second file is a large text file that contains many occurences of each of the 20 names.
I need my program to scan the entirity of the second file and each time it finds one of the names, a variable Count
is incremented and so on the completion of the program, the total number of all the names appearing in the text is stored in Count
.
Here is my loop which searches for and counts the number of name occurences:
char LineOfText[85];
char *TempName;
while(fgets(LineOfText, sizeof(LineOfText), fpn)){
for(a = 0; a<NumOfNames; a++){
TempName = strstr(LineOfText, Names[a]);
if(TempName != NULL){
Count++;
}
}
}
No matter what I do, this loop doesn't work as I would expect it to, but I have discovered what is wrong (I think!). My problem is that each name in the array is NULL terminated, but when a name appears in the text file it is not NULL terminated, unless it occurs as the last word of a line. Therefore, this while
loop is only counting the number of times any of the names appear at the end of a line, rather than the number of appearances of any of the names anywhere in the text file. How can I adjust this loop to combat this problem?
Thank you for any advice in advance.
The issue here is probably your use of fgets
, which does not trim the newline from the line it reads.
If you are creating your names
array by reading lines with fgets
, then all the names will be terminated with a newline character. The lines in the file being read with fgets
will also be terminated with a newline character, so the names will only match at the end of the lines.
strstr
does not compare the NUL byte which terminates the pattern string, for obvious reasons. If it did, it would only match suffix strings, which would make it a very different function.
Also, you will only find a maximum of one instance of each name in each line. If you think that a name might appear more than once in the same line, you should replace:
TempName = strstr(LineOfText, Names[a]);
if(TempName != NULL){
Count++;
}
with something like:
for (TempName = LineOfText;
(TempName = strstr(TempName, Names[a]);
++Count, ++TempName) {
}
For reference, here is the definition of fgets
from the C standard (emphasis added):
The fgets
function reads at most one less than the number of characters specified by n
from the stream pointed to by stream
into the array pointed to by s
. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
This is different from gets
, which does not retain the new-line character.
I think the NULL termination of the names array is not an issue (See strstr function reference). The strstr
function is not going to compare the terminator. You do have the possibility of missing additional names on each line. See my adjustment below for an example of how you could count multiple names on each line.
char LineOfText[85];
char *TempName;
while(fgets(LineOfText, sizeof(LineOfText), fpn)){
for(a = 0; a<NumOfNames; a++){
TempName = strstr(LineOfText, Names[a]);
/* Iterate through line for multiple occurrences of each name */
while(TempName != NULL){
Count++;
/* Get next occurrence of name on line. fgets is going to
leave a newline at the end of the LineOfText string so
unless some of your names contain a newline, it shouldn't
move past the end of the buffer */
TempName = strstr(TempName + 1, Names[a]);
}
}
}