We have a Java application that has a JNI layer that is multi-threaded (pthread) and will call back to the Java level upon messages received from the underlying network.
We notice that every time it crashes, it is caused by a gc. We can even simulate such a crash by manually trigger a gc by calling jmap -histo <pid>
while the JNI layer is receiving messages from the network.
Given the information that we have read about the behaviours in JVM during GC in this post, https://stackoverflow.com/a/39401467/4523221, we still couldn't figure out why such crash is related to gc since JNI function calls are blocked during gc.
If anyone can shed light on this, it will be great. Thanks in advance.
The following is a stack trace that we have collected after a crash in our application.
Program terminated with signal 6, Aborted.
#0 0x0000003cdce325e5 in raise () from /lib64/libc.so.6
#1 0x0000003cdce33dc5 in abort () from /lib64/libc.so.6
#2 0x00007fdafe2516b5 in os::abort(bool) () from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#3 0x00007fdafe3efbf3 in VMError::report_and_die() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#4 0x00007fdafde2f3e2 in report_vm_error(char const*, int, char const*, char const*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#5 0x00007fdafe24c1ff in os::PlatformEvent::park() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#6 0x00007fdafe20c538 in Monitor::ILock(Thread*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#7 0x00007fdafe20c73f in Monitor::lock_without_safepoint_check() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#8 0x00007fdafe2e7a1f in SafepointSynchronize::block(JavaThread*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#9 0x00007fdafe39bcdd in JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#10 0x00007fdafe0123d8 in jni_NewByteArray ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#11 0x00007fdaa447b7d1 in JNIEnv_::NewByteArray (this=0x7fdaf800c9f8, len=7)
at /usr/java/jdk1.8.0_65/include/jni.h:1643
---omitted---
#19 0x0000003cdd20b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#20 0x00007fdafe24c133 in os::PlatformEvent::park() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#21 0x00007fdafe20ce27 in Monitor::IWait(Thread*, long) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#22 0x00007fdafe20d5f0 in Monitor::wait(bool, long, bool) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
---Type <return> to continue, or q <return> to quit---
#23 0x00007fdafe39ed51 in Threads::destroy_vm() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#24 0x00007fdafdfff931 in jni_DestroyJavaVM ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#25 0x00007fdafe91a63d in JavaMain () from /usr/java/jdk1.8.0_65/bin/../lib/amd64/jli/libjli.so
#26 0x0000003cdd207aa1 in start_thread () from /lib64/libpthread.so.0
#27 0x0000003cdcee8aad in clone () from /lib64/libc.so.6
The way we obtained JNIEnv* e.g.
JNIEnv *env = 0;
jint result = jvm->GetEnv((void **) &env, JNI_VERSION_1_8);
if (result != JNI_OK) {
result = jvm->AttachCurrentThread((void **) &env, NULL);
After spending days investigating this JNI issue, we have finally found out the reason and I would like to share our experience here so that hopefully it will help others.
First of all, the reason why we needed to use JNI in the first place was because we needed to make use of a 3rd party network library that was a Linux native lib, and unfortunately that was the cause of our problem.
The library provided us a callback handle that we implemented to receive incoming network messages from it, and this callback, we later found out, was simply a signal handler. So, it means that this signal handler would get called whenever a signal popped up, even during gc.
Since C threads keep running during safepoints in JVM, it would have been fine if those C threads weren't attached to the JVM, otherwise disasters would certainly strike.
Here is kind of what we thought had happened. (everything below happened in the JNI layer)
The gdb stacktrace that we were seeing was basically what happened when a gc thread that was actually in a middle of doing some work on the heap and then got a call from our application to do some application work and then a few JNI API calls... BOOM
Solution:
p.s. maybe some of the details weren't exactly accurate, so any JVM expert advice is welcomed. I will try to correct them as advised.
Thanks
Update.1 (@apangin): We have another gdb stacktrace here. Just wondering if the GangWorker at #18 was a parallel GC thread.