Apache HTTPClient Streaming HTTP POST Request?

2019-06-18 05:24发布

I'm trying to build a "full-duplex" HTTP streaming request using Apache HTTPClient.

In my first attempt, I tried using the following request code:

URL url=new URL(/* code goes here */);

HttpPost request=new HttpPost(url.toString());

request.addHeader("Connection", "close");

PipedOutputStream requestOutput=new PipedOutputStream();
PipedInputStream requestInput=new PipedInputStream(requestOutput, DEFAULT_PIPE_SIZE);
ContentType requestContentType=getContentType();
InputStreamEntity requestEntity=new InputStreamEntity(requestInput, -1, requestContentType);
request.setEntity(requestEntity);

HttpEntity responseEntity=null;
HttpResponse response=getHttpClient().execute(request); // <-- Hanging here
try {
    if(response.getStatusLine().getStatusCode() != 200)
        throw new IOException("Unexpected status code: "+response.getStatusLine().getStatusCode());

    responseEntity = response.getEntity();
}
finally {
    if(responseEntity == null)
        request.abort();
}

InputStream responseInput=responseEntity.getContent();
ContentType responseContentType;
if(responseEntity.getContentType() != null)
    responseContentType = ContentType.parse(responseEntity.getContentType().getValue());
else
    responseContentType = DEFAULT_CONTENT_TYPE;

Reader responseStream=decode(responseInput, responseContentType);
Writer requestStream=encode(requestOutput, getContentType());

The request hangs at the line indicated above. It seems that the code is trying to send the entire request before it gets the response. In retrospect, this makes sense. However, it's not what I was hoping for. :)

Instead, I was hoping to send the request headers with Transfer-Encoding: chunked, receive a response header of HTTP/1.1 200 OK with a Transfer-Encoding: chunked header of its own, and then I'd have a full-duplex streaming HTTP connection to work with.

Happily, my HTTPClient has another NIO-based asynchronous client with good usage examples (like this one). My questions are:

  1. Is my interpretation of the synchronous HTTPClient behavior correct? Or is there something I can do to continue using the (simpler) synchronous HTTPClient in the manner I described?
  2. Does the NIO-based client wait to send the whole request before seeking a response? Or will I be able to send the request incrementally and receive the response incrementally at the same time?

If HTTPClient will not support this modality, is there another HTTP client library that will? Or should I be planning to write a (minimal) HTTP client to support this modality?

2条回答
混吃等死
2楼-- · 2019-06-18 05:54

A separate thread must be writing to the OutputStream for your code to work.

  • The code above provides the HTTPClient with a PipedInputStream.
  • PipedInputStream makes bytes available as they are written to the corresponding OutputStream.
  • The code above does not write to the OutputStream (which must be done by a separate thread.
  • Therefore the code is hanging exactly where your comment is.
  • Under the hood, the Apache client says "inputStream.read()" which in the case of piped streams requires that outputStream.write(bytes) was called previously (by a separate thread).
  • Since you aren't pumping bytes into the associated OutputStream from a separate thread the InputStream just sits and waits for the OutputStream to be written to by "some other thread."

From the JavaDocs:

A piped input stream should be connected to a piped output stream; the piped input stream then provides whatever data bytes are written to the piped output stream.

Typically, data is read from a PipedInputStream object by one thread and data is written to the corresponding PipedOutputStream by some other thread.

Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread.

The piped input stream contains a buffer, decoupling read operations from write operations, within limits. A pipe is said to be "broken" if a thread that was providing data bytes to the connected piped output stream is no longer alive.

Note: Seems to me, since piped streams and concurrency were not mentioned in your problem statement, that it's not necessary. Try wrapping a ByteArrayInputStream() with the Entity object instead first for a sanity check... that should help you narrow down the issue.

Update

Incidentally, I wrote an inversion of Apache's HTTP Client API [PipedApacheClientOutputStream] which provides an OutputStream interface for HTTP POST using Apache Commons HTTP Client 4.3.4. This may be close to what you are looking for...

Calling-code looks like this:

// Calling-code manages thread-pool
ExecutorService es = Executors.newCachedThreadPool(
  new ThreadFactoryBuilder()
  .setNameFormat("apache-client-executor-thread-%d")
  .build());


// Build configuration
PipedApacheClientOutputStreamConfig config = new      
  PipedApacheClientOutputStreamConfig();
config.setUrl("http://localhost:3000");
config.setPipeBufferSizeBytes(1024);
config.setThreadPool(es);
config.setHttpClient(HttpClientBuilder.create().build());

// Instantiate OutputStream
PipedApacheClientOutputStream os = new     
PipedApacheClientOutputStream(config);

// Write to OutputStream
os.write(...);

try {
  os.close();
} catch (IOException e) {
  logger.error(e.getLocalizedMessage(), e);
}

// Do stuff with HTTP response
...

// Close the HTTP response
os.getResponse().close();

// Finally, shut down thread pool
// This must occur after retrieving response (after is) if interested   
// in POST result
es.shutdown();

Note - In practice the same client, executor service, and config will likely be reused throughout the life of the application, so the outer prep and close code in the above example will likely live in bootstrap/init and finalization code rather than directly inline with the OutputStream instantiation.

查看更多
小情绪 Triste *
3楼-- · 2019-06-18 06:05

Here is my view on skim reading the code:

  1. I cannot completely agree with the fact that a non-200 response means failure. All 2XX responses are mostly valid. Check wiki for more details

  2. For any TCP request, I would recommend to receive the entire response to confirm that it is valid. I say this because, a partial response may mostly be treated as bad response as most of the client implementations cannot make use of it. (Imagine a case where server is responding with 2MB of data and it goes down during this time)

查看更多
登录 后发表回答