Best practices on upgrading cassandra

2019-03-05 22:39发布

问题:

I am getting errors from pyspark connecting to cassandra because it appears I am using a too old a cassandra:

[idf@node1 python]$ nodetool -h localhost version
ReleaseVersion: 2.0.17
[idf@node1 python]$

[idf@node1 cassandra]$ java --version Unrecognized option: --version Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.

[idf@node1 cassandra]$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
[idf@node1 cassandra]$

I want to upgrade to the latest version. However, I have already collected quite a bit of data and I don't want to lose it. I am using CentOS 7.2 with a single cassandra node. My questions are,

where is the cassandra data stored on the local file system
is it correct to assume that I can compress this directory and move it?

Then once I have the data backed up, what is the correct way to upgrade cassandra? Is it

remove old version completely
install new version
copy data back

What is the best practice to do this?

回答1:

I'm guessing you are using OSS version. Default location for data is /var/lib/cassandra and you can backup it if you wan't. Procedure for upgrade is simple:

run nodetool drain
stop cassandra
save your cassandra.yaml
remove old and install new version
update new cassandra.yaml with your settings
start cassandra
run nodetool ugradesstables

This should leave you with your node running the new version of cassandra with all your schema and data in it. Be careful if you are upgrading past 2.1 because 2.2 and up require java8. Everything else is the same.