I have created a new DataStax Enterprise Cluster that is managed using OpsCenter. All versions used are the latest available from the package repository. The agents have been installed and everything is working perfectly, including RAM Usage, CPU Load, etc. I have added over 90 GB to this cluster without a problem and the hosts can support a lot more..
It is clearly an OpsCenter / DataStax-Agent issue from what I can see. I do not see a relevant line in the log files of either OpsCenter or DSA. Other clusters in the same OpsCenter instance work without a problem.
Any idea on what might be the problem?
Update #1:
The df(1)
output in a host is:
Filesystem Type 1K-blocks Used Available Use% Mounted on
udev devtmpfs 16440732 4 16440728 1% /dev
tmpfs tmpfs 3290304 652 3289652 1% /run
/dev/sda6 ext4 921095148 33460384 840822760 4% /
none tmpfs 4 0 4 0% /sys/fs/cgroup
none tmpfs 5120 0 5120 0% /run/lock
none tmpfs 16451516 0 16451516 0% /run/shm
none tmpfs 102400 0 102400 0% /run/user
/dev/sda1 ext2 240972 67121 161410 30% /boot
and in an other host is:
Filesystem Type 1K-blocks Used Available Use% Mounted on
udev devtmpfs 16367904 4 16367900 1% /dev
tmpfs tmpfs 3275852 728 3275124 1% /run
/dev/md1 ext4 958985688 92799452 817449468 11% /
none tmpfs 4 0 4 0% /sys/fs/cgroup
none tmpfs 5120 0 5120 0% /run/lock
none tmpfs 16379256 0 16379256 0% /run/shm
none tmpfs 102400 0 102400 0% /run/user
/dev/md0 ext3 1014680 105884 856420 12% /boot
Output of https://<host>:<port>/<Cluster-Name>/storage-capacity
:
{"free_gb": 0, "used_gb": 0, "reporting_nodes": 3}
The
Data Size
metric is the value returned as the nodes load (same as under "Load:" when doingnodetool info
).Storage capacity actually checks the disk usage, on Linux using
df
(this probably doesn't work at all in some versions of Windows, so if using Windows thats your issue). There have been a number of issues with this, but the most recent versions have some fixes so make sure your on new version. Check in the agents logs (/var/log/datastax-agent/agent.log) for something along the lines ofProcess failed
which may give more details.There's a bug in the agent. If you run
df <file>
, you should get a different filesystem than if you rundf --print-type --no-sync --local
. In my case, where I'm able to replicate,df /home/<user>/random-folder
yields/dev/disk/by-uuid/<uuid>
under the filesystem column.This is due to mounting your drive (to boot with grub/lilo) using
by-uuid
instead of a label. Bothdf
labels/output must match.It will be fixed in the next release.
For a temporary fix, while we fix this for next release, make sure you mount your drive used for the data using a label instead of uuid, and verify that these
df
outputs match.