Java's 'tnameserv' takes 3+ minutes to

2019-02-15 00:12发布

问题:

I'm trying to help a dev of an app I'd like to use trouble shoot a problem utilizing Corba Server on Linux. I narrowed down the problem to tnameserv taking over 3 minutes to become ready after invocation.

What exactly is tnameserv trying to do in those 3 minutes and is there anyway I can speed it up? The app failed because it tried to do 5 connection attempts with 1-second between retries; which apparently doesn't give tnameserv nearly enough time to become ready. I am using Java 6u17 on Slackware 13.0

In case it matters. The actual invocation of tnameserv is the following:

tnameserv -ORBInitialPort 23423

Upon running that command in a shell, it appears to hang until around the 3minute mark when I finally see it display "Ready".

UPDATE

I did an strace -f tnameserv -ORBInitialPort 23423 and I am seeing an eff ton of calls to gettimeofday(), clock_gettime() and futex() of which the latter is always returning '-1 ETIMEDOUT (Connection timed out). I have a feeling this is related to my problem, but I have no idea how or why.

Here is a but a small fraction of what I am seeing from strace. Can somebody replicate and/or make sense of this?

[pid 30950] futex(0x8128e14, FUTEX_WAIT_PRIVATE, 1, {0, 49903084}) = -1 ETIMEDOUT (Connection timed out)
[pid 30950] futex(0x8098a28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329619, 995857482}) = 0
[pid 30950] gettimeofday({1260930158, 92108}, NULL) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329619, 995996617}) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329619, 996088536}) = 0
[pid 30950] gettimeofday({1260930158, 92328}, NULL) = 0
[pid 30950] clock_gettime(CLOCK_REALTIME, {1260930158, 92424295}) = 0
[pid 30950] futex(0x8128e14, FUTEX_WAIT_PRIVATE, 1, {0, 49903705}) = -1 ETIMEDOUT (Connection timed out)
[pid 30950] futex(0x8098a28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329620, 46761098}) = 0
[pid 30950] gettimeofday({1260930158, 143084}, NULL) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329620, 46913924}) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329620, 47006961}) = 0
[pid 30950] gettimeofday({1260930158, 143303}, NULL) = 0
[pid 30950] clock_gettime(CLOCK_REALTIME, {1260930158, 143398317}) = 0
[pid 30950] futex(0x8128e14, FUTEX_WAIT_PRIVATE, 1, {0, 49904683}) = -1 ETIMEDOUT (Connection timed out)
[pid 30950] futex(0x8098a28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329620, 97818379}) = 0
[pid 30950] gettimeofday({1260930158, 194127}, NULL) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329620, 97957235}) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329620, 98049154}) = 0
[pid 30950] gettimeofday({1260930158, 194346}, NULL) = 0
[pid 30950] clock_gettime(CLOCK_REALTIME, {1260930158, 194441349}) = 0
[pid 30950] futex(0x8128e14, FUTEX_WAIT_PRIVATE, 1, {0, 49904651}) = -1 ETIMEDOUT (Connection timed out)
[pid 30950] futex(0x8098a28, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329620, 148806370}) = 0
[pid 30950] gettimeofday({1260930158, 245055}, NULL) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329620, 148947182}) = 0
[pid 30950] clock_gettime(CLOCK_MONOTONIC, {329620, 148981547}) = 0
[pid 30950] gettimeofday({1260930158, 245280}, NULL) = 0
[pid 30950] clock_gettime(CLOCK_REALTIME, {1260930158, 245374859}) = 0
[pid 30950] futex(0x8128e14, FUTEX_WAIT_PRIVATE, 1, {0, 49905141}) = -1 ETIMEDOUT (Connection timed out)

回答1:

I found this using the google (LL Zamenhof's birthday today btw). I would turn on some wireshark tracing and see if you can see anything there. Though I would expect to see stuff in the strace if there was network activity. Too bad it's not open source.



回答2:

Turns out the problem was a firewall issue. Wireshark didn't show anything useful because the firewall was dropping a certain packet. Although I looked at my firewall logs quite a few times to make sure this wasn't the case, turns out I wasn't looking in the right place. I overlooked the fact that this 'tnameserv' was IPv6 aware (as it was binding to :::23423) and a cursory glance of my firewall script showed that I was logging IPv6 related packets to a different location than my IPv4 packets. This was not an oversight but had to be done because ip6tables does not currently support the -j ULOG target.

Long story short, allowing loopback for IPv6 fixed the issue and 'tnameserv' returns "Ready" almost instantly.



回答3:

Check your DNS-settings. You may time out waiting for an answer several times.