Can't backup to S3 with OpsCenter 5.2.1

2020-04-01 07:38发布

问题:

I upgraded OpsCenter from 5.1.3 to 5.2.0 (and then to 5.2.1). I had a scheduled backup to local server and an S3 location configured before the upgrade, which worked fine with OpsCenter 5.1.3. I made to no changes to the scheduled backup during or after the upgrade.

The day after the upgrade, the S3 backup failed. In opscenterd.log, I see these errors:

2015-09-28 17:00:00+0000 [local] INFO: Instructing agents to start backups at Mon, 28 Sep 2015 17:00:00 +0000 2015-09-28 17:00:00+0000 [local] INFO: Scheduled job 458459d6-d038-41b4-9094-7d450e4bac6f finished 2015-09-28 17:00:00+0000 [local] INFO: Snapshots started on all nodes 2015-09-28 17:00:08+0000 [] WARN: Marking request d960ad7b-2ccd-40a4-be7e-8351ac038c53 as failed: {'sstables': {u'solr_admin': {u'solr_resources': {'total_size': 155313, 'total_files': 12, 'done_files': 0, 'errors': [u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', shortened for brevity.

The S3 location no longer appears in OpsCenter when I edit the scheduled backup job. When I try to re-add the S3 location, using the same bucket and credentials as before, I get the following error:

Location validation error: Call to /local/backups/destination_validate timed out.

Also, I don't know if this is related, but for completeness, I see some of these errors in the opscenterd.log as well:

WARN: No http agent exists for definition file update. This is likely due to SSL import failure.

I get this behavior with either DataStax Enterprise 4.5.1 or 4.7.3.

回答1:

I have been having the exact same problem since updating to OpsCenter 5.2.x and just was able to get it working properly.

I removed all the settings suggested in the previous answer and then created new buckets in us-west-1, us-west-2 and us-standard. After this I was able to successfully able to add all of those as destinations quickly and easily.

It appears to me that the problem is that OpsCenter may be trying to list the objects in the bucket that you configure initially, which in my case for the 2 existing ones we were using had 11TB and 19GB of data in them respectively.

This could explain why increasing the timeout for some worked and not others.

Hope this helps.



回答2:

Try adding the remote_backup_region property to the cluster configuration file under the [agents] heading in "cluster-name".conf. Valid values are: us-standard, us-west-1, us-west-2, eu-west-1, ap-northeast-1, ap-southeast-1

Does that help?



回答3:

The problem was resolved by a combination of 2 things.

  1. Delete the entire contents of the existing S3 bucket (or create a new bucket as previously suggested by @kaveh-nowroozi).
  2. Edit /etc/datastax-agent/datastax-agent-env.sh and increase the heap size to 512M as suggested by a DataStax engineer. The default was set at 128M and I kept doubling it until backups became successful.