I would like a pure R way to test whether two arbitrary files are different. So, the equivalent to diff -q
in Unix, but should work on Windows and without external dependencies.
I'm aware of tools::Rdiff
, but it seems to only want to deal with R output files and complains loudly if I feed it something else.
Without using memory, if the files are too large:
library(tools)
md5sum("file_1.txt") == md5sum("file_2.txt")
I realize this is not exactly what you're asking for, but I post it for the benefit of others who run into this question wanting to see the full diff and willing to tolerate external dependencies. In that case, diffobj
will show them to you with a real diff that works on windows, with the same algorithm as GNU diff. In this example, we compare the Moby Dick text to a version of it with 5 lines modified:
library(diffobj)
diffFile(mob.1.txt, mob.2.txt) # or `diffChr` if you data in R already
Produces:
If you want something faster while still getting the locations of the differences you can get the shortest edit script, from the same package:
ses(readLines(mob.1.txt), readLines(mob.2.txt))
# [1] "1127c1127" "2435c2435" "6417c6417" "13919c13919"
Code to get the Moby Dick data (note I didn't set seed, so you'll get different lines):
moby.dick.url <- 'http://www.gutenberg.org/files/2701/2701-0.txt'
moby.dick.raw <- moby.dick.UC <- readLines(moby.dick.url)
to.UC <- sample(length(moby.dick.raw), 5)
moby.dick.UC[to.UC] <- toupper(moby.dick.UC[to.UC])
mob.1.txt <- tempfile()
mob.2.txt <- tempfile()
writeLines(moby.dick.raw, mob.1.txt)
writeLines(moby.dick.UC, mob.2.txt)
the closest to the unix command is diffr
- it shows a really nice side by side window with all the different lines marked in color.
library(diffr)
diffr(filename1, filename2)
shows
Example solution:
(Using all.equals utility from: https://stat.ethz.ch/R-manual/R-devel/library/base/html/all.equal.html)
filenameForA <- "my_file_A.txt"
filenameForB <- "my_file_B.txt"
all.equal(readLines(filenameForA), readLines(filenameForB))
Note, that
readLines(filename)
reads all the lines from given file specified by filename,
then all.equal can figure out if the files differ or not.
Make sure to read the documentation from above to understand fully.
I've to admit, that if the files are very large, this might not be the best option.
all.equal(readLines(f1), readLines(f2))