Your Cassandra cluster failed to deploy. Replica S

2019-08-10 20:50发布

问题:

I tried to deploy a Cassandra Cluster using Google Compute Engine, no success. I tried several times, the error was always the same:

module: DEPLOYMENT_FAILED
Replica module-1234 failed with status PERMANENTLY_FAILING: Replica State
changed to PERMANENTLY_FAILING. Replica was unhealthy 2 consecutive times.

After following this short troubleshooting guidelines: https://cloud.google.com/solutions/cassandra/click-to-deploy#troubleshooting, the log is the following:

antoniogallo88_gmail_com@cassandra-coord-v8ip:/gagent/metaOutput$ tail $(ls -1tr /gagent/metaOutput/stderr.*.txt | 
tail -n 1)
Still waiting for resourceview cassandranode-4da4e to have 3 members ...
Still waiting for resourceview cassandranode-4da4e to have 3 members ...
Still waiting for resourceview cassandranode-4da4e to have 3 members ...
Still waiting for resourceview cassandranode-4da4e to have 3 members ...
Still waiting for resourceview cassandranode-4da4e to have 3 members ...
Still waiting for resourceview cassandranode-4da4e to have 3 members ...
Still waiting for resourceview cassandranode-4da4e to have 3 members ...
Still waiting for resourceview cassandranode-4da4e to have 3 members ...
Still waiting for resourceview cassandranode-4da4e to have 3 members ...
[ERROR] resourceview cassandranode-4da4e does not have 3 members after 60 attempts.

Do you have any idea how to fix this?

Thanks.

Antonio

回答1:

Can you check that the instance type you've chosen (in # of cores) and the number of cluster members exceed the cpu quota for the project you're using? Also check the disk capacity value and your overall disk quota.

You can check max allowable disk and CPU quota in the console under Compute Engine > Quotas.

This sounds like a quota issue even though the console is not surfacing a quota error.

Another thing you can do is create another deployment, then quickly switch over to the instance list page and look for an instance called "Cassandra-coord-foo" which is a short-lived instance that manages disk creation. If you ssh into that node during deployment and run the following command, you may see a disk or CPU quota warning:

tail -f /gagent/metaOutput/*

Chris