-->

Why there is performance degradation after ~6 hour

2020-02-23 05:29发布

问题:

I switched 1 instance (2 vCPU, 2GB RAM, load ~4k req/sec) to Java 9 (from latest Java 8). For a while, everything was fine and CPU usage was same as before. However, after ~6 hours CPU consumption increased by 4% (from 21% to 25%) for no reason. I had no traffic spikes, no memory consumption increased, no metric changes (I have counters for every method within code). Nothing.

I left this instance untouched for ~12 hours expecting it will revert back. But nothing changed. It just started consuming more CPU.

top command showed that the instance had more CPU spikes than usually for the Java server process. I read recently that G1 is not suitable for the high throughput. So I made a conclusion that reason could be in G1.

I restarted instance with:

java -XX:+UseParallelGC -jar server-0.28.0.jar

And after ~20 hours of the monitoring, everything is fine as before. CPU consumption is on the level of 21% as it was many days before.

CPU usage right after Java 9 deployment (6h scale):

CPU increase after 7 hours + 12 hours "untouched" (7d scale):

CPU after -XX:+UseParallelGC (24h scale):

So my question is - is that expected behavior for the G1? Anyone else sees something similar?

Ubuntu 16.04 x64

java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

EDIT 03.01.2019

Tried to run one the same server with G1 on the java 10.0.2:

java version "10.0.2" 2018-07-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.2+13)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.2+13, mixed mode)

G1 consumes 40% more CPU than UseParallelGC right after the server restart.

回答1:

(Note that GC tuning is extremely dependant on the environment, so there is no magic recipe.)

Had a very similar issue with G1. By default, it seems to be rather badly suited for REST endpoints (again, this is only what I experienced in my direct neighborhood). What helped me was experimenting with the GC flags, as described here.

For us, the biggest improvements came from -XX:G1NewSizePercent=25 and -XX:MaxGCPauseMillis=50. G1 is also auto-tuning itself over time, so the max. GC pause limit has a significant effect on all other parameters.