How do I create a copy of a data frame in R

2020-07-01 05:45发布

问题:

I want to duplicate the full contents of a data frame that has been read in from a *.csv file. I don't believe it is a duplication if I do copyOfFirstFrame <- firstFrame. So what do I need to do?

firstFrame <- read_csv("fileName.csv")
copyOfFirstFrame <- ?????

If I do the following the memory address remains the same.

copyOfFirstFrame <- firstFrame
tracemem(firstFrame) == tracemem(copyOfFirstFrame)
[1] TRUE

The copy must result in two unique memory addresses. Check In R, how can I check if two variable names reference the same underlying object? for details.

回答1:

Let DATA be a pre-existing data frame object. I am creating a new object, COPY which is an exact copy of DATA, but it occupies a different memory location and hence doesn't point to the original data frame.

I use the function data.frame() like this:

> COPY<-data.frame(DATA)

I check whether the memory addresses are same or not using tracemem():

> tracemem(COPY)==tracemem(DATA)
> [1] FALSE

Lame enough, I think.



回答2:

Using cbind with one data.frame will ensure you have a copy:

> df <- cbind(NA, NA)
> df2 <- cbind(df)
> df2
     [,1] [,2]
[1,]   NA   NA
> df2[,1] <- 1
> df
     [,1] [,2]
[1,]   NA   NA
> df2
     [,1] [,2]
[1,]    1   NA
> 


回答3:

Alternatively we can use data.table::copy().

df.1 <- data.frame(1)

library(data.table)
df.2 <- copy(df.1)

> tracemem(df.1) == tracemem(df.2)
[1] FALSE


回答4:

Neither cbind() nor data.frame, with variables added or variable names changed insulate original data frame from modifications made by set() function of data.table to copy of data frame.

> library(data.table)
> # changing name of variable in copy doesn't work, emp modified
> (emp <- data.frame(type=c('a','b','c'),amt=as.numeric(c(1,2,3))))
  type amt
1    a   1
2    b   2
3    c   3
> (dd <- cbind(emp,dv=''))
  type amt dv
1    a   1   
2    b   2   
3    c   3   
> names(dd)[names(dd)=='type'] <- 'tp'
> i <- which(dd$tp=='a'); set(dd,i,'tp','alpha')
> i <- which(dd$tp=='b'); set(dd,i,'tp','beta')
> i <- which(dd$tp=='c'); set(dd,i,'tp','chi')
> dd
     tp amt dv
1 alpha   1   
2  beta   2   
3   chi   3   
> emp
   type amt
1 alpha   1
2  beta   2
3   chi   3
> dd$dv <- factor(dd$dv)
> table(dd$dv)
> table(emp$type)

    a     b     c alpha  beta   chi 
    0     0     0     1     1     1 
> tracemem(dd)==tracemem(emp)
[1] FALSE
> 
> # same w/ data.frame doesn't work, emp still modified
> (emp <- data.frame(type=c('a','b','c'),amt=as.numeric(c(1,2,3))))
  type amt
1    a   1
2    b   2
3    c   3
> (dd <- data.frame(emp,dv=1))
  type amt dv
1    a   1  1
2    b   2  1
3    c   3  1
> names(dd)[names(dd)=='type'] <- 'tp'
> i <- which(dd$tp=='a'); set(dd,i,'tp','alpha')
> i <- which(dd$tp=='b'); set(dd,i,'tp','beta')
> i <- which(dd$tp=='c'); set(dd,i,'tp','chi')
> dd$tp <- factor(dd$tp)
> table(dd$tp)

alpha  beta   chi 
    1     1     1 
> table(emp$type)

    a     b     c alpha  beta   chi 
    0     0     0     1     1     1 
> tracemem(dd)==tracemem(emp)
[1] FALSE
> 
> # only modifying new variable insulates emp
> (emp <- data.frame(type=c('a','b','c'),amt=as.numeric(c(1,2,3))))
  type amt
1    a   1
2    b   2
3    c   3
> (dd <- cbind(emp,dv=''))
  type amt dv
1    a   1   
2    b   2   
3    c   3   
> names(dd)[names(dd)=='type'] <- 'tp'
> i <- which(dd$tp=='a'); set(dd,i,'dv','alpha')
> i <- which(dd$tp=='b'); set(dd,i,'dv','beta')
> i <- which(dd$tp=='c'); set(dd,i,'dv','chi')
> dd
  tp amt    dv
1  a   1 alpha
2  b   2  beta
3  c   3   chi
> emp
  type amt
1    a   1
2    b   2
3    c   3
> table(emp$type)

a b c 
1 1 1 
> tracemem(dd)==tracemem(emp)
[1] FALSE
> 


标签: r dataframe