Java garbage collector G1GC taking long for 'O

I'm not a Java newby but I know only a tiny bit about garbage collection. Now I would like to change that with some practical experiences. My goal is a latency of under 0.3 seconds, or in extreme cases 0.5 is okay too.

I have an application with -Xmx50gb (-Xms50gb) and set the other following GC options:

-XX:+UseG1GC -Xloggc:somewhere.gc.log -XX:+PrintGCDateStamps

But now I'm occasionally having long pauses over 5sec due to garbage collections and that although there seems to be enough memory available. One reason I found:

[GC pause (G1 Evacuation Pause) (young) 42G->40G(48G), 5.9409662 secs]

Why is GCG1 still doing a "stop the world" for this? (Or at least I see that it stops my application at exactly this time) And why does it do such a negative cleanup if it is not really necessary as there is over 12% of the available RAM free. Also I thought that the default value for -XX:MaxGCPauseMillis is 200 milliseconds, why is this value violated by a factor of 29 or even 50 (see below)?

Another reason for a delay was:

[GC pause (Metadata GC Threshold) (young) (initial-mark) 40G->39G(48G), 10.4667233 secs]

This will be probably solved via this answer e.g. just increasing the metadata space -XX:MetaspaceSize=100M

BTW: Using JSE 1.8.0_91-b14

Update: detailed GC log of such an event

2016-08-12T09:20:31.589+0200: 1178.312: [GC pause (G1 Evacuation Pause) (young) 1178.312: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 3159, predicted base time: 1.52 ms, remaining time: 198.48 ms, target pause time: 200.00 ms]
 1178.312: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 136 regions, survivors: 20 regions, predicted young region time: 1924.75 ms]
 1178.312: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 136 regions, survivors: 20 regions, old: 0 regions, predicted pause time: 1926.27 ms, target pause time: 200.00 ms]
 1185.330: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 21.83 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)]
 1185.330: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 42580574208 bytes, allocation request: 0 bytes, threshold: 23592960000 bytes (45.00 %), source: end of GC]
 1185.330: [G1Ergonomics (Mixed GCs) do not start mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 1 regions, reclaimable: 3381416 bytes (0.01 %), threshold: 5.00 %]
, 7.0181903 secs]
   [Parallel Time: 6991.8 ms, GC Workers: 10]
      [GC Worker Start (ms): Min: 1178312.6, Avg: 1178312.8, Max: 1178312.9, Diff: 0.2]
      [Ext Root Scanning (ms): Min: 1.1, Avg: 1.5, Max: 2.3, Diff: 1.2, Sum: 15.0]
      [Update RS (ms): Min: 0.0, Avg: 0.3, Max: 1.3, Diff: 1.3, Sum: 3.4]
         [Processed Buffers: Min: 0, Avg: 2.1, Max: 5, Diff: 5, Sum: 21]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 1.7]
      [Object Copy (ms): Min: 6964.1, Avg: 6973.0, Max: 6989.5, Diff: 25.3, Sum: 69730.4]
      [Termination (ms): Min: 0.0, Avg: 16.4, Max: 25.3, Diff: 25.3, Sum: 164.4]
         [Termination Attempts: Min: 1, Avg: 3.2, Max: 13, Diff: 12, Sum: 32]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
      [GC Worker Total (ms): Min: 6991.5, Avg: 6991.6, Max: 6991.7, Diff: 0.2, Sum: 69915.5]
      [GC Worker End (ms): Min: 1185304.3, Avg: 1185304.3, Max: 1185304.3, Diff: 0.0]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 26.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 25.3 ms]
      [Ref Enq: 0.1 ms]
 [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.2 ms]
   [Eden: 2176.0M(2176.0M)->0.0B(2176.0M) Survivors: 320.0M->320.0M Heap: 40.6G(48.8G)->40.0G(48.8G)]
 [Times: user=0.55 sys=46.58, real=7.02 secs]

Read here about it: Copying (Stop the World Event) - These are the stop the world pauses to evacuate or copy live objects to new unused regions. This can be done with young generation regions which are logged as [GC pause (young)]. Or both young and old generation regions which are logged as [GC Pause (mixed)].

Why is GCG1 still doing a "stop the world" for this?

Because G1 is not a pauseless collector, it is just a low-pause collector.

Also I thought that the default value for -XX:MaxGCPauseMillis is 200 milliseconds, why is this value violated by a factor of 29 or even 50 (see below)?

It is, but it's just a goal, not a guarantee. Many things can cause it to fail to meet that goal. You got a fairly large heap, this makes things more difficult, i.e. failures are easier to provoke.

Anyway, the GC tuning journey starts with enabling verbose GC logging via

-Xloggc:<path to gc log file>
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails

and then running the resulting log through GCViewer to get a general overview and then going back to reading individual log entries (there are many answers/blog posts on this topic) to figure out what might be causing the worst behavior. Depending on the cause various remedies can be tried.

Some general understanding of how tracing garbage collectors work in general and G1 will be necessary to avoid cargo-culting.

My application has many allocations which could be easily called "humongous allocations".

If that actually is the cause then current VMs have some experimental options to reclaim them sooner.

 [Object Copy (ms): Min: 6964.1, Avg: 6973.0, Max: 6989.5, Diff: 25.3, Sum: 69730.4]
 [Times: user=0.55 sys=46.58, real=7.02 secs]

This means it's spends most of the time in the kernel when doing something that should mostly consist of memory accesses and not system calls. So swap activity or transparent huge pages are likely suspects.