I have a large data.frame
(15 columns and 100,000 rows) in an existing R session that I want to send to a Q/KDB instance. From KDB's cookbook, the possible solutions are:
RServer for Q: use KDB to create new R instance which shares memory space. This doesn't work because my data is in an existing instance of R.
RServe: run an R server and use TCP/IP to communicate with Q/KDB client. This does not work, because as per RServe's documentation, "every connection has a separate workspace and working directory" and so i presume does not see my existing data.
R Math Library: access R's functionality via a math library without needing an instance of R. This does not work because my data is already in an instance of R.
So any other ideas on how to send data from R to Q/KDB?
open a port in Q. I start Q with a batch file:
@echo off
c:\q\w32\q -p 5001
load qserver.dll
tryCatch({
dyn.load("c:/q/qserver.dll")}
,error = function(f){
print("can't load qserver.dll")
})
Then use these
open_connection <- function(host="localhost", port=5001, user=NULL) {
parameters <- list(host, as.integer(port), user)
h <- .Call("kx_r_open_connection", parameters)
assign(".k.h", h, envir = .GlobalEnv)
return(h)
}
close_connection <- function(connection) {
.Call("kx_r_close_connection", as.integer(connection))
}
execute <- function(connection, query) {
.Call("kx_r_execute", as.integer(connection), query)
}
d<<-open_connection(host="localhost",port=thePort)
ex2 <- function(...)
{
query <- list(...)
theResult <- NULL
for(i in query) theResult <- paste0(theResult,i)
return(execute(d,paste0(theResult)))
}
then ex2 can take multiple arguments so you can build queries with R variables and strings
Edit: thats for R from Q, heres R to Q
2nd Edit: improved algo:
library(stringr)
RToQTable <- function(Rtable,Qname,withColNames=TRUE,withRowNames=TRUE,colSuffix = NULL)
{
theColnames <- if(!withColNames || length(colnames(Rtable))==0) paste0("col",as.character(1:length(Rtable[1,])),colSuffix) else colnames(Rtable)
if(!withRowNames || length(rownames(Rtable))==0) withRowNames <- FALSE
Rtable <- rbind(Rtable,"linesep")
charnum <- as.integer(nchar(thestr <- paste(paste0(theColnames,':("',str_split(paste(Rtable,collapse='";"'),';\"linesep\";\"')[[1]],');'),collapse="")) - 11)
if(withRowNames)
ex2(Qname,":([]",Qname,str_replace_all(paste0("`",paste(rownames(Rtable),collapse="`"))," ","_"),";",.Internal(substr(thestr,1L,charnum)),"))") else
ex2(Qname,":([]",.Internal(substr(thestr,1L,charnum)),"))")
}
> bigMat <- matrix(runif(1500000),nrow=100000,ncol=15)
> microbenchmark(RToQTable(bigMat,"Qmat"),times=3)
Unit: seconds
expr min lq mean median uq max neval
RToQTable(bigMat, "Qmat") 10.29171 10.315 10.32766 10.33829 10.34563 10.35298 3
This will work for a matrix, so for a data frame just save a vector containing the types of each column, then convert the dataframe to a matrix, import the matrix to Q, and cast the types
Note that this algo is approx O(rows * cols^1.1) so you'll need to chop the columns up into multiple matricies if you have any more than 20 to get O(rows * cols)
but for your example 150,000 rows and 15 columns takes 10 seconds so further optimization may not be necessary.