I am working on C. I would like to ask what s the best way to search in a file for a specific line (or multiple lines)? Can someone please give me an example. I have 2 files and I would like to see if this two files are 80% identical. I thought about searching in one of the file some specific lines from the other file. Thx
I need some example in C code. here is a small example
int compareFile(FILE* file_compared, FILE* file_checked)
{
bool diff = 0;
int N = 65536;
char* b1 = (char*) calloc (1, N+1);
char* b2 = (char*) calloc (1, N+1);
size_t s1, s2;
do {
s1 = fread(b1, 1, N, file_compared);
s2 = fread(b2, 1, N, file_checked);
if (s1 != s2 || memcmp(b1, b2, s1)) {
diff = 1;
break;
}
} while (!feof(file_compared) || !feof(file_checked));
free(b1);
free(b2);
if (diff) return 0;
else return 1;
}
how to return the percentage of identical lines?
The real problem with diff algorithms is that you cannot simply compare line-by-line. Let's say the files are virtually identical, but one file has an additional line at the beginning of the file. A naive (line-by-line
memcmp
) implementation would result in 100% difference...You probably have much reading to do. The link above might provide you with a starting point.
Then again, if you aren't looking at a homework / reinvent-the-wheel style assignment, you might want to build on existing work. Like, run the two files through
diff -y --suppress-common-lines | wc -l
and plainwc -l
, gather the output of those two calls, and calculate the percentage. Yes, this looks crude, but it's much easier and quicker than writing your own diff algorithm. You'll also benefit from future improvements of thediff
tool, the maintainers of which spend all their time on this stuff.Then again, I'd do this in bash, not in C. ;)
Have you tried http://www.text-compare.com/ yet? It's an easy way to compare two files and find the differences.
If you really need an implementation in C, why not have two file handlers, read strings per newline, compare both strings, if they match keep them, if not, walk through the characters to find the differences.
Or you could load the master file, and then compare the compare file to every line in the master file and see if any line gives a match > 75% and display the changes.
Can you show what you have done so far?