I have write performance issues with Cassandra 3.
I'm trying out Cassandra 3.3 using the official Docker image from here: https://github.com/docker-library/cassandra
I start it as follows:
docker run --net=host --rm cassandra:3.3
Then run cassandra-stress against it:
cassandra-stress write
This gives me the following results for four threads executing traffic:
op rate : 1913 [WRITE:1913]
partition rate : 1913 [WRITE:1913]
row rate : 1913 [WRITE:1913]
latency mean : 2.1 [WRITE:2.1]
latency median : 1.6 [WRITE:1.6]
latency 95th percentile : 4.1 [WRITE:4.1]
latency 99th percentile : 8.4 [WRITE:8.4]
latency 99.9th percentile : 20.5 [WRITE:20.5]
latency max : 155.4 [WRITE:155.4]
Total partitions : 154607 [WRITE:154607]
Total errors : 0 [WRITE:0]
total gc count : 13
total gc mb : 1951
total gc time (s) : 1
avg gc time(ms) : 59
stdev gc time(ms) : 28
Total operation time : 00:01:20
Doing the exact same thing for Cassandra 2.2 using the official image:
docker run --net=host --rm cassandra:2.2
Gives me the following result with four threads:
op rate : 2248 [WRITE:2248]
partition rate : 2248 [WRITE:2248]
row rate : 2248 [WRITE:2248]
latency mean : 1.8 [WRITE:1.8]
latency median : 1.4 [WRITE:1.4]
latency 95th percentile : 3.5 [WRITE:3.5]
latency 99th percentile : 7.2 [WRITE:7.2]
latency 99.9th percentile : 16.4 [WRITE:16.4]
latency max : 129.5 [WRITE:129.5]
Total partitions : 195461 [WRITE:195461]
Total errors : 0 [WRITE:0]
total gc count : 11
total gc mb : 1612
total gc time (s) : 1
avg gc time(ms) : 62
stdev gc time(ms) : 21
Total operation time : 00:01:26
The number of writes/s is almost 15 % lower on 3.3 compared to 2.2. What could the reason for that be? I've been trying to change various parameters including running Cassandra on Oracle (compared to OpenJDK) above with no significant differences. I've also tried different versions of Cassandra 3 with no real difference spotted. I know that this is a single node and that the results from this basic test cannot be transferred to a production setting. Still, I'm curious to know if anybody has an explanation or can reproduce the behaviour.
Any input is welcome!
Update 2016-04-13, JVM parameter differences:
diff jvm-param22-sorted jvm-param33-sorted
> -XX:+AlwaysPreTouch
# Before removing this from the 3.3 config:
# USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
# cassand+ 1 19.0 30.3 2876304 1229136 ? Ssl 04:18 0:17 java
#
# After removing it:
# USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
# cassand+ 177 17.0 9.4 2885596 383972 ? Sl+ 04:59 0:17 java -
#
# The above could actually explain some of the issues that I've observed in
# low memory environments with multiple containers running since I guess it
# means Cassandra will grab the whole heap at startup from the OS
# regardless of the actual need.
# This was present twice in the 22 config. Should not matter.
< -XX:CMSWaitDuration=10000
# Removing this from the 3.3 config did not have any significant impact
> -XX:+ResizeTLAB
# Removing from the 3.3 config did not have any significant impact
> -XX:-UseBiasedLocking
# Adding this one to the 3.3 config did not have a significant impact
< -XX:+UseCondCardMark