How to convert the values in a dataframe to a dumm

2019-07-18 17:58发布

问题:

The given data is

SNP1 <- c("AA","GG","AG")
SNP2 <- c("AA","CC","AC")
SNP3 <- c("GG","AA","AG")
df<- data.frame(SNP1, SNP2, SNP3)
colnames(df)<- c('rs10000438', 'rs10000500','rs1000055')

I define a data function which is dominant_dummy. When I run the codes I found it goes wrong.

Error in if (!check) { : argument is of length zero 

When I debug I found that the argument x in this is a dataframe, and I need to use the function levels(x) to check the level of x, and also assign levels(x)<- c(0,1,1), the levels function return null. My purpose is to convert the values in the dataframe df to dummy values based on the conditions.

  SNP_lib<- NCBI_snp_query(names(x))
  NCBI_snp_query(names(x))
  SNP_min<- SNP_lib$Minor
  SNP_name<- SNP_lib$Query
  SNP_min ="A"
  SNPs <- x

  check<-substr(levels(SNPs)[2],1,1)==SNP_min

I need to assign the dummy values to this dataframe like levels(x)<- c(0,1,1). How can I do that?

library(rsnps)
dominant_dummy<- function(x){

  SNP_lib<- NCBI_snp_query(names(x))
  NCBI_snp_query(names(x))

  SNP_min<- SNP_lib$Minor
  SNP_name<- SNP_lib$Query
  SNP_min ="A"
  SNPs <- x

  check<-substr(levels(SNPs)[2],1,1)==SNP_min
  if(!check){
    levels(SNPs)<-c(0,1,1)
    SNPs<-as.numeric(as.character(SNP))
  }else {levels(SNPs)<-c(1,1,0)
  SNPs<-as.numeric(as.character(SNP))}
}

df_3levels<-sapply(1:ncol(df), function(i) dominant_dummy(df[,i, drop=FALSE]))

回答1:

You can not check levels on a data frame. Use levels(SNPs[[1]]) instead to check the levels on the first column. But there as other errors as well.



回答2:

With three changes to your code, I could execute it without error message. The most significant change was in the last line.

SNP1 <- c("AA", "GG", "AG")
SNP2 <- c("AA", "CC", "AC")
SNP3 <- c("GG", "AA", "AG")
df <- data.frame(SNP1, SNP2, SNP3)
colnames(df) <- c('rs10000438', 'rs10000500', 'rs1000055')

library(rsnps)
dominant_dummy <- function(x) {
  SNP_lib <- NCBI_snp_query(names(x))
  NCBI_snp_query(names(x))

  SNP_min <- SNP_lib$Minor
  SNP_name <- SNP_lib$Query
  SNP_min = "A"
  SNPs <- x

  check <- substr(levels(SNPs)[2], 1, 1) == SNP_min
  if (!check) {
    levels(SNPs) <- c(0, 1, 1)
    SNPs <- as.numeric(as.character(SNPs)) # fixed
  } else {
    levels(SNPs) <- c(1, 1, 0)
    SNPs <- as.numeric(as.character(SNPs)) # fixed
  }
}

df_3levels <- sapply(df, dominant_dummy) # fixed

df_3levels
     rs10000438 rs10000500 rs1000055
[1,]          1          1         0
[2,]          0          0         1
[3,]          1          1         1

Please, let me know if this is the expected result.



标签: r levels