I produced a large data frame (1700+obs,159 variables) with a function that collects info from a website. Usually, the function finds numeric values for some columns, and thus they're numeric. Sometimes, however, it finds some text, and converts the whole column to text. I have one df whose column classes are correct, and I would like to "paste" those classes to a new, incorrect df. Say, for example:
dfCorrect<-data.frame(x=c(1,2,3,4),y=as.factor(c("a","b","c","d")),z=c("bar","foo","dat","dot"),stringsAsFactors = F)
str(dfCorrect)
'data.frame': 4 obs. of 3 variables:
$ x: num 1 2 3 4
$ y: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
$ z: chr "bar" "foo" "dat" "dot"
## now I have my "wrong" data frame:
dfWrong<-as.data.frame(sapply(dfCorrect,paste,sep=""))
str(dfWrong)
'data.frame': 4 obs. of 3 variables:
$ x: Factor w/ 4 levels "1","2","3","4": 1 2 3 4
$ y: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
$ z: Factor w/ 4 levels "bar","dat","dot",..: 1 4 2 3
I wanted to copy the classes of each column of dfCorrect
into dfWrong
, but haven't found how to do it properly.
I've tested:
dfWrong1<-dfWrong
dfWrong1[0,]<-dfCorrect[0,]
str(dfWrong1) ## bad result
'data.frame': 4 obs. of 3 variables:
$ x: Factor w/ 4 levels "1","2","3","4": 1 2 3 4
$ y: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
$ z: Factor w/ 4 levels "bar","dat","dot",..: 1 4 2 3
dfWrong1<-dfWrong
str(dfWrong1)<-str(dfCorrect)
'data.frame': 4 obs. of 3 variables:
$ x: num 1 2 3 4
$ y: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
$ z: chr "bar" "foo" "dat" "dot"
Error in str(dfWrong1) <- str(dfCorrect) :
could not find function "str<-"
With this small matrix I could go by hand, but what about larger ones? Is there a way to "copy" the classes from one df to another without having to know the individual classes (and indexes) of each column?
Expected final result (after properly "pasting" classes):
all.equal(sapply(dfCorrect,class),sapply(dfWrong,class))
[1] TRUE
Thanks,
You could try this:
...although my first instinct is to agree with Oliver that if it were me I'd try to ensure the correct class at the point you're reading the data.