-->

R and cassandra connection error

2020-07-30 03:14发布

问题:

library(RJDBC)




cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
                    list.files("/home/beyhan/Downloads/jars/",pattern="jar$",full.names=T))

    casscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9042")

Output

> cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
+ list.files("/home/beyhan/Downloads/jars/",pattern="jar$",full.names=T))
> casscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9042")

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: org/apache/thrift/transport/TTransportException

回答1:

Okay, the ODBC Connector is based on the THRIFT Protocol. The THRIFT Connection to Cassandra is deprecated. I think the Python in solution is the best approach for you. Here a example: How to read data from Cassandra with R?

And here is a blog post about Thrift vs. CQL: http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster



回答2:

Our JDBC Driver for Cassandra allows you to access your Cassandra data in R. To be clear, our driver creates a relational interface to your Cassandra data, allowing you to submit SQL queries to Cassandra through our driver (internally, we translate the SQL to CQL, send the request and return the results as a relational database).

We have an article in our Knowledge Base for connecting, but I'll transcribe it here as well.

  1. Load the RJDBC Package:

    library(RJDBC)
    
  2. Set the driver class and classpath:

    driver <- JDBC(driverClass = "cdata.jdbc.cassandra.CassandraDriver", classPath = "MyInstallationDir\lib\cdata.jdbc.cassandra.jar", identifier.quote = "'")
    
  3. Initialize the JDBC connection:

    conn <- dbConnect(driver,"Database=MyCassandraDB;Port=7000;Server=127.0.0.1;")
    

    (Set the Server, Port, and Database connection properties to connect to Cassandra.)

At this point, you can perform standards actions available in R, like:

  • Listing the tables:

    dbListTables(conn)
    
  • Executing any SQL query supported by the Cassandra API:

    customer <- dbGetQuery(conn,"SELECT City, SUM(TotalDue) FROM Customer GROUP BY City")
    
  • Viewing the results:

    View(customer)
    

Feel free to download a free Beta of the driver! If you have any questions, please let us know.