I want to duplicate the full contents of a data frame that has been read in from a *.csv file. I don't believe it is a duplication if I do copyOfFirstFrame <- firstFrame
. So what do I need to do?
firstFrame <- read_csv("fileName.csv")
copyOfFirstFrame <- ?????
If I do the following the memory address remains the same.
copyOfFirstFrame <- firstFrame
tracemem(firstFrame) == tracemem(copyOfFirstFrame)
[1] TRUE
The copy must result in two unique memory addresses. Check In R, how can I check if two variable names reference the same underlying object? for details.
Let DATA be a pre-existing data frame object.
I am creating a new object, COPY which is an exact copy of DATA, but it occupies a different memory location and hence doesn't point to the original data frame.
I use the function data.frame() like this:
> COPY<-data.frame(DATA)
I check whether the memory addresses are same or not using tracemem():
> tracemem(COPY)==tracemem(DATA)
> [1] FALSE
Lame enough, I think.
Using cbind with one data.frame will ensure you have a copy:
> df <- cbind(NA, NA)
> df2 <- cbind(df)
> df2
[,1] [,2]
[1,] NA NA
> df2[,1] <- 1
> df
[,1] [,2]
[1,] NA NA
> df2
[,1] [,2]
[1,] 1 NA
>
Alternatively we can use data.table::copy()
.
df.1 <- data.frame(1)
library(data.table)
df.2 <- copy(df.1)
> tracemem(df.1) == tracemem(df.2)
[1] FALSE
Neither cbind() nor data.frame, with variables added or variable names changed insulate original data frame from modifications made by set() function of data.table to copy of data frame.
> library(data.table)
> # changing name of variable in copy doesn't work, emp modified
> (emp <- data.frame(type=c('a','b','c'),amt=as.numeric(c(1,2,3))))
type amt
1 a 1
2 b 2
3 c 3
> (dd <- cbind(emp,dv=''))
type amt dv
1 a 1
2 b 2
3 c 3
> names(dd)[names(dd)=='type'] <- 'tp'
> i <- which(dd$tp=='a'); set(dd,i,'tp','alpha')
> i <- which(dd$tp=='b'); set(dd,i,'tp','beta')
> i <- which(dd$tp=='c'); set(dd,i,'tp','chi')
> dd
tp amt dv
1 alpha 1
2 beta 2
3 chi 3
> emp
type amt
1 alpha 1
2 beta 2
3 chi 3
> dd$dv <- factor(dd$dv)
> table(dd$dv)
> table(emp$type)
a b c alpha beta chi
0 0 0 1 1 1
> tracemem(dd)==tracemem(emp)
[1] FALSE
>
> # same w/ data.frame doesn't work, emp still modified
> (emp <- data.frame(type=c('a','b','c'),amt=as.numeric(c(1,2,3))))
type amt
1 a 1
2 b 2
3 c 3
> (dd <- data.frame(emp,dv=1))
type amt dv
1 a 1 1
2 b 2 1
3 c 3 1
> names(dd)[names(dd)=='type'] <- 'tp'
> i <- which(dd$tp=='a'); set(dd,i,'tp','alpha')
> i <- which(dd$tp=='b'); set(dd,i,'tp','beta')
> i <- which(dd$tp=='c'); set(dd,i,'tp','chi')
> dd$tp <- factor(dd$tp)
> table(dd$tp)
alpha beta chi
1 1 1
> table(emp$type)
a b c alpha beta chi
0 0 0 1 1 1
> tracemem(dd)==tracemem(emp)
[1] FALSE
>
> # only modifying new variable insulates emp
> (emp <- data.frame(type=c('a','b','c'),amt=as.numeric(c(1,2,3))))
type amt
1 a 1
2 b 2
3 c 3
> (dd <- cbind(emp,dv=''))
type amt dv
1 a 1
2 b 2
3 c 3
> names(dd)[names(dd)=='type'] <- 'tp'
> i <- which(dd$tp=='a'); set(dd,i,'dv','alpha')
> i <- which(dd$tp=='b'); set(dd,i,'dv','beta')
> i <- which(dd$tp=='c'); set(dd,i,'dv','chi')
> dd
tp amt dv
1 a 1 alpha
2 b 2 beta
3 c 3 chi
> emp
type amt
1 a 1
2 b 2
3 c 3
> table(emp$type)
a b c
1 1 1
> tracemem(dd)==tracemem(emp)
[1] FALSE
>