First, some background. There is a worker which expands/resolves bunch of short URLS:
http://t.co/example -> http://example.com
So, we just follow redirects. That's it. We don't read any data from the connection. Right after we got 200 we return the final URL and close InputStream.
Now, the problem itself. On a production server one of the resolver threads hangs inside the InputStream.close()
call:
"ProcessShortUrlTask" prio=10 tid=0x00007f8810119000 nid=0x402b runnable [0x00007f882b044000]
java.lang.Thread.State: RUNNABLE
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.skip(BufferedInputStream.java:352)
- locked <0x0000000561293aa0> (a java.io.BufferedInputStream)
at sun.net.www.MeteredStream.skip(MeteredStream.java:134)
- locked <0x0000000561293a70> (a sun.net.www.http.KeepAliveStream)
at sun.net.www.http.KeepAliveStream.close(KeepAliveStream.java:76)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.close(HttpURLConnection.java:2735)
at ru.twitter.times.http.URLProcessor.resolve(URLProcessor.java:131)
at ru.twitter.times.http.URLProcessor.resolve(URLProcessor.java:55)
at ...
After a brief research, I understood that skip()
is called to clean up the stream before sending it back to the connections pool (if keep-alive is set on?). Still I don't understand how to avoid this situation. Moreover, I doubt if there is some bad design in our code or there is problem in JDK.
So, the questions are:
- Is it possible to avoid hanging on
close()
? Guarantee some reasonable timeout, for example. - Is it possible to avoid reading data from connection at all?
Remember I just want the final URL. Actually, I think, I don't want
skip()
to be called at all ...
Update:
KeepAliveStream, line 79, close()
method:
// Skip past the data that's left in the Inputstream because
// some sort of error may have occurred.
// Do this ONLY if the skip won't block. The stream may have
// been closed at the beginning of a big file and we don't want
// to hang around for nothing. So if we can't skip without blocking
// we just close the socket and, therefore, terminate the keepAlive
// NOTE: Don't close super class
try {
if (expected > count) {
long nskip = (long) (expected - count);
if (nskip <= available()) {
long n = 0;
while (n < nskip) {
nskip = nskip - n;
n = skip(nskip);} ...
More and more it seems to me that there is a bug in JDK itself. Unfortunately, it's very hard to reproduce this ...
I guess this
skip()
onclose()
is intended for Keep-Alive support.See http://docs.oracle.com/javase/6/docs/technotes/guides/net/http-keepalive.html.
So keep alive can be effectively disabled with
http.KeepAlive.remainingData=0
orhttp.keepAlive=false
. But this can negatively affect performance if you always address to the same http://t.co host.As @artbristol suggested, using HEAD instead of GET seems to be the preferable solution here.
The implementation of
KeepAliveStream
that you have linked, violates the contract under whichavailable()
andskip()
are guaranteed to be non-blocking and thus may indeed block.The contract of available() guarantees a single non-blocking
skip()
:Wheres the implementation calls
skip()
multiple times per single call toavailable()
:This doesn't prove that your application blocks because
KeepAliveStream
incorrectly usesInputStream
. Some implementations ofInputStream
may possibly provide stronger non-blocking guarantees, but I think it is a very likely suspect.EDIT: After a bit more research, this is a very recently fixed bug in JDK: https://bugs.openjdk.java.net/browse/JDK-8004863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel. The bug report says about an infinite loop, but a blocking
skip()
could also be a result. The fix seems to address both issues (there is only a singleskip()
peravailable()
)I was facing a similar issue when I was trying to make a "HEAD" request. To fix it, I removed the "HEAD" method because I just wanted to ping the url