Issue with lapply using biomart

2019-07-29 14:21发布

问题:

I am trying to use lapply to change the species name when extracting all the human genes.

I'm still learning how to use lapply, I cant work out what I'm doing wrong.

So far I have:

library(biomaRt)

I create the marts:

ensembl_hsapiens <- useMart("ensembl", 
                        dataset = "hsapiens_gene_ensembl")
ensembl_mmusculus <- useMart("ensembl", 
                     dataset = "mmusculus_gene_ensembl")
ensembl_ggallus <- useMart("ensembl",
                       dataset = "ggallus_gene_ensembl")

Set the species:

species <- c("hsapiens", "mmusculus", "ggallus")

I then try to use lapply:

species_genes <- lapply(species, function(s) getBM(attributes = c("ensembl_gene_id", 
                                                  "external_gene_name"), 
                                   filters = "biotype", 
                                   values = "protein_coding", 
                                   mart = paste0(s, "_ensembl")))))

It gives me an error message saying:

Error in martCheck(mart) : You must provide a valid Mart object. To create a Mart object use the function: useMart. Check ?useMart for more information.

回答1:

this should do the trick:

species_genes <- lapply(species, function(s) getBM(attributes = c("ensembl_gene_id", 
                                                                  "external_gene_name"), 
                                                   filters = "biotype", 
                                                   values = "protein_coding", 
                                                   mart = get(paste0("ensembl_", s))))

Explanation:

the mart argument in getBM functions expects an object of class Mart and not a string

class(ensembl_ggallus)
#output
[1] "Mart"
attr(,"package")
[1] "biomaRt"

by using

paste0("ensembl_", s)

you get a string such as:

"ensembl_hsapiens"

the base function get searches for an object in the environment by name.

get("ensembl_hsapiens") 

identical(get("ensembl_hsapiens"), ensembl_hsapiens)
#output
TRUE