We recently decided to enable GC logging for Apache Storm workers on a number of clusters (exact version varies) as a aid to looking into topology-related memory and garbage collection problems. We want to do that for workers, but we also want to avoid two problems we know might happen:
- overwriting of the log file when a worker restarts for any reason
- the logs using too much disk space, leading to disks getting filled (if you keep the cluster running long enough, log files will fill up disk unless managed)
When Java GC logging starts for a process it seems to replace the content of any file that has the same name. This means that unless you are careful, you will lose the GC logging, perhaps when you are most likely to need it.
You can set JVM options for Storm workers via the worker.childopts property in storm.yaml (if you are managing Storm through Apache Ambari, look under Storm service > configs > advanced storm-site > worker.childopts). You will be adding additional JVM properties to that.
To enable GC logging to a file, you will need to add -verbose:gc -Xloggc:<log-file-location>
.
You need to give the log file name special consideration to prevent overwrites. It seems like you need to have a unique name for every invocation. To achieve this, take advantage of some of special "%" string replacements mentioned in the Storm code documentation. For uniqueness, %WORKER-ID%
is sufficient it is (quite likely) unique for each worker process. You may also want to be able to easily tell what topology the GC log is for. In that case add in %TOPOLOGY-ID%
(you may need to say %ID%
some older versions of Storm); it may be long but will provide the name of the topology.
So far the JVM options are -verbose:gc -Xloggc:/var/log/storm/storm-worker-%TOPOLOGY-ID%-%WORKER-ID%-gc.log
(the -%TOPOLOGY-ID%
is optional, the path should match your Storm logging directory, and you can name the log file differently if you prefer).
Now onto managing use of disk space. I'll be happy if there is a simpler way that what I have.
First, take advantage of Java's built-in GC log file rotation. -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
is an example of enabling this rotation, having up to 10 GC log files from the JVM, each of which is no more than 10MB in size. 10 x 10MB is 100MB max usage. Note that this is per worker instance.
With the GC log file rotation in place with up to 10 files, '.0', '.1', ... '.9' will be added to the file name you gave in Xloggc. .0 will be first and after it reaches .9 it will replace .0 and continue on in a round robin manner. In some versions of Java '.current' will be additionally put on the end of the name of the log file currently being written to.
Due to the unique file naming we apparently have to add to avoid overwrites, this means you can have 100MB per worker process invocation, so this is not a total solution to managing disk space used by storm worker child GC logs. You will end up with a set of up to 10 GC log files for each process -- this can add up. The best solution (under *nix) to that would seem to be to use the logrotate utility
to periodically clean up worker GC logs that have not been modified in the last N days.
Be sure to do the math and make sure you will have enough disk space.
People frequently want more details and context in their GC logs than the default, so consider adding in -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
.
All together, you will be adding something like the following to worker.childopts: -verbose:gc -Xloggc:/var/log/storm/storm-worker-%TOPOLOGY-ID%-%WORKER-ID%-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
plus configure logrotate.
Finally, I should mention a couple other options for naming log files, though I don't see the advantage, at least for my use case:
- in some versions of Java you can put %t in GC log file naming and Java will replace that with the current timestamp formatted as
<YYYY>-<MM>-<DD>_<HH>-<MM>-<SS>
. You can also out %p to get the current process ID.
- Somebody told me that in some cases you can put backticked expressions such as `date +'%Y%m%d%H%M'` in some combinations of Storm and Java, at least if you use Ambari. He reported that that worked with Storm 0.10.0 and Java 1.7.0_95, but I was unable to get that behavior with storm 0.9.3.2.2.0.0-2041 and java 1.7.0_75.