R using RJDBC write table to Hive

2020-04-08 13:11发布

问题:

I have successfully connected local R3.1.2( win7 64bit rstudio) and remote hive server using rjdbc,

library(RJDBC)
.jinit()
dir = "E:/xxx/jars/hive/" 
for(l in list.files(dir)) {
  .jaddClassPath(paste( dir ,l,sep="")) }
options( java.parameters = "-Xmx8g" ) 
drv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver",
        "E:/xxx/jars/hive/hive-jdbc-0.11.0.jar")

conn <- dbConnect(drv,  "jdbc:hive://10.127.130.162:10002/default", "", "" ) 
dbGetQuery(conn, "select * from test.test limit 10 ")

successfully read data from hive ,but I cannot write R data frame using
dbWriteTable :

data(iris)
dbWriteTable(conn, iris , "test.dc_test")

Error return:

Error in .jcall(md, "Ljava/sql/ResultSet;", "getTables", .jnull("java/lang/String"), : method getTables with signature (Ljava/lang/String;Ljava/lang/String;[Ljava/lang/String;)Ljava/sql/ResultSet; not found

Either my misuse or other methods needed?

回答1:

I have a partial answer. Your arguments to dbWriteTable are reversed. The pattern is dbWriteTable(connection, tableName, data), the docs read dbWriteTable(conn, name, value, ...). That being said, I don't find that the 'correct' form works either, instead yielding the following error message:

Error in .local(conn, statement, ...) : 
  execute JDBC update query failed in dbSendUpdate ([Simba][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 40000, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:42000, errorCode:40000, errorMessage:Error while compiling statement: FAILED: ParseException line 1:41 mismatched input 'PRECISION' expecting ) near 'DOUBLE' in create table statement), Query: CREATE TABLE iris (`Sepal.Length` DOUBLE PRECISION,`Sepal.Width` DOUBLE PRECISION,`Petal.Length` DOUBLE PRECISION,`Petal.Width` DOUBLE PRECISION,Species VARCHAR(255)).)

(at least when using Amazon's JDBC driver for Hive). That error at least seems self apparent, the query generated to make the table for data insertion didn't parse correctly in HiveQL. The fix, other than doing it manually, I'm not sure about.



回答2:

through these years, I still cannot find a full solution...but here is also a partial one, only available for write small data.frame and how small vary from 32/64bit , mac/win ...

first change dataframe to character vector

data2hodoop <- paste0( apply(dataframe, 1, function(x) paste0("('", paste0(x, collapse = "', '"), "')")), collapse = ", ")

then use insert to write lines into hadoop

dbSendQuery(conn, paste("INSERT INTO ", tbname," VALUES ",data2hodoop, ";" ))

in my PC, WIN7 64BIT 16G, if the vector 'data2hodoop' larger than 50M, there will be an error " C stack usage xxx is too close to the limit";

in my mac, the limit is even lower, and I can not find a way to modify this limit.



标签: r jdbc hive