I have to implement an HTTP client in Java and for my needs it seems that the most efficient way to do it, is implement HTTP pipeline (as per RFC2616).
As an aside, I want to pipeline POSTs. (Also I am not talking about multiplexing. I am talking about pipelining i.e. many requests over one connection before receiving any response- batching of HTTP requests)
I could not find a third party library that explicitly states it supports pipelining. But I could use e.g. Apache HTTPCore to build such a client, or if I have to, build it by myself.
The problem I have is if it is a good idea. I have not found any authoritative references that HTTP pipelining is something more than a theoretical model and is properly implemented by HTTP servers. Additionally all browsers that support pipelining have this feature off by default.
So, should I try to implement such a client or I will be in a lot of trouble due to server's implementations (or proxies). Is there any reference that gives guidelines on these?
If it is a bad idea what would be the alternative programming model for efficiency? Separate TCP connections?
I've implemented a pipelined HTTP client. The basic concept sounds easy but error handling is very hard. The performance gain is so insignificant that we gave up on the concepts long time ago.
In my opinion, it doesn't make sense to normal use-case. It only has some benefits when the requests have logic connections. For example, you have a 3-requests transaction and you can send them all in a batch. But normally you can combine them into one request if they can be pipelined.
Following are just some hurdles I can remember,
TCP's keepalive is not guaranteed persistent connection. If you have 3 requests piped in the connection, server drops connection after first response. You supposed to retry the next two requests.
When you have multiple connections, load balance is also tricky. If no idle connection, you can either use a busy connection or create a new one.
Timeout is also tricky. When one request times out, you have to discard all after it because they must come back in order.
pipelining makes almost no difference to http servers; they usually process requests in a connection serially anyway - read a request, write a response, then reads the next request...
but client would very likely improve throughput by multiplexing. websites usually have multiple machines with multiple cpus, why do you want to voluntarily limit your requests into a single line? today it's more about horizontal scalability (concurrent requests). of course, it's best to benchmark it.
POST should not be pipelined
http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html