how to determine if two file are identical in c us

2019-09-25 07:08发布

i need to see if 2 files are identical so i used struct stat

    fdin = open(argv[0],O_RDONLY);
    statos= fstat(fdin, &stat);
    close(fdin);
    fdin = open(argv[1],O_RDONLY);
    statos1= fstat(fdin, &stat1);
    close(fdin);
    printf("file 1 is in size: %lu\n",stat1.st_ino);
    printf("file 2 is in size: %lu\n",stat.st_ino);

the result

file 1 is in size: 9569486
file 2 is in size: 9569479

why is the st.ino aren't identical for the same file with the same path?? and how can i do so if two different file are identical i could check it with system call

3条回答
何必那么认真
2楼-- · 2019-09-25 07:13

It's because you're opening two different files:

./a.out ab.txt ab.txt

argv[0] is the executable, argv[1] is 'ab.txt'.

If you put error checks into your code, it would be clear.

You're also printing the inodes as "size", for some reason.

查看更多
我欲成王,谁敢阻挡
3楼-- · 2019-09-25 07:23

You did not define what identical files means to you.

On Unix or Linux (or Posix), inside Posix compliant file systems (like ext4 or btrfs but not FAT32), a file can have none, one, or several file paths. So file paths refer to inodes. Then two file paths refer to the same underlying file if they point to the same inode (in the same file system).

You can use the stat(2) syscall to get the device (i.e. filesystem) st_dev and inode number st_ino of some file path. Then you should compare both of them.

Alternatively, you may imagine that identical files means to you files with same content. This is an ill-defined definition: a file's content may change (because it is simultaneously written -by write(2), thru some mmap(2)-ing, etc etc....- by some other process) while you are reading it. So stricto sensu comparing file contents (which is an expensive operation) does not make sense : the content can be changed by some other process during the compare.

If you wrongly ignore the fact that a file's content can change (because another process is writing into it at the same time you are reading it), you could read both files and compare each byte (also taking into account the end-of-file). Something as simple as :

 FILE *f1 = fopen(path1, "r");
 if (!f1) {  perror(path1); exit(EXIT_FAILURE);  };
 FILE *f2 = fopen(path2, "r");
 if (!f2) {  perror(path2); exit(EXIT_FAILURE);  };
 bool samefile = true;
 int c1, c2;
 while (samefile && ((c1 = getc(f1)) != EOF) || (c2 = getc(f2)) != EOF))
    if (c1 != c2) samefile = false;
 fclose (f1), fclose (f2);

You could optimize the above code by comparing first the sizes of f1 and f2 using fseek(3) and ftell(3)

You could also use plain open(2) and read(2) and close(2) syscalls, be sure to read by chunks (of e.g. 4Kbytes) into a buffer (one for each file), and take care of the end-of-file condition, and to check for errors.

As others pointed out, your program is wrong notably because argv[0] is the program's command name.

查看更多
叛逆
4楼-- · 2019-09-25 07:35

st_ino is the field for the file's inode number, The inode number is a unique identifier for each file's structure. The inode structure holds the information that the stat call returns about the file.

The field you want is st_size.

查看更多
登录 后发表回答