I seem to be running into a scenario where the JVM is stuck indefinitely in trying to get to a safe point after a few hours. However, if I do a jstack with -F option it seems to get out of that wait and continues with the execution.
jdk1.8.0_45/bin/jstack -F 39924 >a.out
I am using jdk1.8.0_45 on Centos
My questions are:
i) It seems that the JVM can come out of that safepoint indefinite wait when sent an interrupt from jstack. How come it doesnt come out without jstack. Is there some jvm option I can use to avoid that indefinite wait.
ii) Can I get a more definite thread dump of the the thread that's causing the issue. The output from the safepoint log seems imprecise.
The options I am using are:.
-server
-XX:+AggressiveOpts
-XX:+UseG1GC
-XX:+UnlockExperimentalVMOptions
-XX:G1MixedGCLiveThresholdPercent=85
-XX:InitiatingHeapOccupancyPercent=30
-XX:G1HeapWastePercent=5
-XX:MaxGCPauseMillis=1000
-XX:G1HeapRegionSize=4M
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+UnlockExperimentalVMOptions
-XX:G1LogLevel=finest
-Xmx6000m
-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=999
-XX:+SafepointTimeout
-XX:+UnlockDiagnosticVMOptions
-XX:SafepointTimeoutDelay=20000
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
The safepoint log
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.115: G1IncCollectionPause [ 170 0 0 ] [ 0 0 0 0 8 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.125: RevokeBias [ 170 1 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.127: RevokeBias [ 170 1 1 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.131: RevokeBias [ 170 1 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.955: RevokeBias [ 169 0 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17772.160: BulkRevokeBias [ 171 0 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17772.352: RevokeBias [ 170 1 3 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17773.596: RevokeBias [ 169 0 1 ] [ 0 0 0 0 0 ] 0
# SafepointSynchronize::begin: Timeout detected:
# SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
# SafepointSynchronize::begin: Threads which did not reach the safepoint:
# "Thread-14" #115 prio=5 os_prio=0 tid=0x00007f20c8029000 nid=0x9cd0 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE
# SafepointSynchronize::begin: (End of list)
After the jstack interrupt this is what I see from the safepoint log
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17779.826: G1IncCollectionPause [ 169 1 1 ] [3315603 03315603 0 8 ] 1
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
21095.439: RevokeBias [ 169 2 13 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
21095.439: RevokeBias [ 169 1 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
21095.441: RevokeBias [ 184 3 4 ] [ 0 0 3 0 1 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
21095.447: RevokeBias [ 190 0 2 ] [ 0 0 4 0 2 ] 0