The problem is simple, consider the following example:
m <- head(iris)
write.csv(m, file = 'm.csv')
m1 <- read.csv('m.csv')
The result of this is that m1
is different from the original object m
in that it has a new first column named "X". If I really wanted to make them equal, I have to use additional arguments, like in these two examples:
write.csv(m, file = 'm.csv', row.names = FALSE)
# and then
m1 <- read.csv('m.csv')
or
write.csv(m, file = 'm.csv')
m1 <- read.csv('m.csv', row.names = 1)
The question is, what is the reason of this difference? in particular, why if write.csv
and read.csv
are supposedly intended to stick to the Excel convention, the don't import the same object that was exported in the first place? To me this is a very counter intuitive behavior and highly undesirable.
(this results happens exactly the same if I use the csv2 variants of these functions)
Thanks in advance!
These are the data.frames m
and m1
if you prefer not to use R to see the example:
> m
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> m1
X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
Here's my guess...
write.table
writes a data.frame to a file and data.frames always have row names, so not writing row names by default would be throwing away information. (Yes,write.table
will also write a matrix and matrices don't have to have row names, but data.frames are probably used much more often than matrices.)read.table
returns a data.frame but CSV files don't have any concept of row names, so someone may argue that it's counter-intuitive to assume, by default, that the first column of a CSV is a row name.Now there may be a way to make these two functions consistent, but I would argue that writing to a text file isn't the best way to output/input data from one R session to another. It's much safer/faster to use
save
,load
,saveRDS
,readRDS
, etc.