Using tbl and src_monetdblite to access data

2019-08-24 18:53发布

问题:

Sorry if this question has been asked elsewhere, I can't find it. I'm working through some basic examples in MonetDBLite.

> dbGetQuery(dbcon, "SELECT MAX(mpg) FROM mtcars WHERE cyl = 8")
L3
1 19.2

works, but

> ms <- MonetDBLite::src_monetdblite("./DB")
> t <- tbl(ms, "mtcars")
Error in UseMethod("tbl") : 
no applicable method for 'tbl' applied to an object of class
"c('src_monetdb', 'src_sql', 'src')"

It seems that it's trying to assign the db to t not the table.

Any suggestions would be greatly appreciated.

I've been perusing resources and found a useR2016 presentation and noticed a difference here:

> ms
src:  MonetDBEmbeddedConnection
tbls: mtcars

Curious...

回答1:

I'm a huge fan of using MonetDBLite together with dplyr. My addition to Hannes Mühleisen's (thanks for the package!) answer would be that it appears that the order you load the packages can matter. Loading MonetDBLite after dplyr and dbplyr seems to be the key for me. Loading MonetDBLite first causes errors similar to the one nzgwynn noted.

Sometimes I could connect to the database with no problems. Other times I would get error messages like:

Error in UseMethod("db_query_fields") : no applicable method for 'db_query_fields' applied to an object of class "MonetDBEmbeddedConnection"

Like nzgwynn, I was puzzled about why it would work sometimes but not others. Restarting and reinstalling wouldn't necessarily fix it for me.

This clue, from an issue filed about sparklyr, lead me to explore the package loading order:

https://github.com/rstudio/sparklyr/issues/38

Like noted there with sparklyr, and I've noticed with other R database packages, MonetDBLite will load and attach automatically if the Global Environment already contains a connection object. My problem was that I had an src_monetdb object in my workspace, which was causing MonetDBLite to load upon starting RStudio. So I while I thought I was loading it after dplyr and dbplyr, it was really loading first. If I clear the workspace and then restart, I can load the packages in the preferred order. So far, this method has worked.

I've seen starting with a clean workspace advised as good practice generally, e.g.: https://twitter.com/hadleywickham/status/561146907519500288. Starting with a fresh workspace loses you no time either given MonetDBLite's speedy query ability.

Lastly, I would put a enthusiastic pitch in for using MonetDBLite. I saw it mentioned on RStudio's database page and was immediately impressed on how easy it was to setup and how fast it is. It's the best way I've found for working with a ~2 GB dataset in R. When exploring the data interactively, the dplyr queries run so quickly that it feels like I'm working with the data in memory. And if all I want to do is load the whole dataset into memory, MonetDBLite is as fast or faster than other methods I've tried like read.fst() from the fst package.



回答2:

I closed R and opened it again and the same coding worked fine...



回答3:

You need to call library("dplyr") before using tbl and friends. Also make sure you have dbplyr installed.

Update: Also, please make sure there is no connection object (src) in a stored workspace loaded at startup. Loading connections from .Rdata files does not work! Instead, create the connection/src from scratch every time you run a script.