I'm working on an application that requires for one type of message to go hit a database, and the other type of message to go and hit some external xml api.
I have to process A LOT... one of the big challenges is to get HttpWebRequest class performing well. I initially started with just using the standard synchronous methods and threadpooling the whole thing. This was not good.
So after a bit of reading I saw that the recommended way to do this was to use the Begin/End methods to delegate the work to IO completion ports, thus freeing up the threadpool and yielding better performance. This doesn't seem to be the case... the performance is marginally better but I certainly can't see the IO completion ports being used that much compared to threadpool.
I have a thread that spins round and sends me the available worker threads + completion ports in the threadpool. Completion ports is always very low (max I've seen is 9 used) and I'm always using about 120 worker threads (sometimes more). I use the begin / end pattern for all methods in httpwebrequest
:
Begin/EndGetRequestStream
Begin/EndWrite (Stream)
Begin/EndGetResponse
Begin/EndRead (Stream)
Am I doing it right? Am I missing something? I can use (sometimes) up to 2048 http connections simultaneously (from netstat output) - why would the completion port numbers be so low?
If anyone could give some serious advice about how to do well with this managing worker threads, completion ports and httpwebrequest
it would be hugely appreciated!
EDIT: is .NET a reasonable tool for this? Can I get a high volume of httpconnections working with .NET and the System.Net stack? It's been suggested to use something like WinHttp (or some other C++ library), and pInvoke it from .NET, but this isn't something I especially want to do!
The way I understand it, you don't tie up an I/O completion port all the time that an asynchronous request is outstanding - it's only "busy" when data has been returned and is being processed on the corresponding thread. Hopefully you don't have very much work to do in the callback, which is why you don't have many in-use ports at any one time.
Are you actually getting poor performance though? Is your cause for concern merely the low numbers? Are you getting the throughput you'd expect?
One problem you may have is that the HTTP connection pool for any one host is relatively small. If you have hundreds of requests to the same machine, then by default only 2 requests will actually be made at a time, to avoid DoS-attacking the host in question (and to get the benefits of keep-alive). You can increase this programmatically or using app.config. Of course, this may not be an issue in your case, either because you've already fixed the problem or because all your requests are to different hosts. (If netstat is showing 2048 connections then that doesn't sound bad.)
Maybe your EndRead methods should only write the result to a thread safe queue that you then read from a small number of worker threads that are under your control. And/Or use the fact that HttpWebRequest will signal a waitable object when it is done and write your own logic to wait on all the outstanding requests from a single (or small number of) threads.
Having only 9 completion port threads actually means you're probably using them correctly and efficiently. I'm going to assume that the machine you're running on has either 8 cores or 4 hyperthreaded cores which means that the OS will try to keep up to 8 active (not sleeping/blocking/waiting) completion port threads at any time.
If one of the running threads becomes inactive (sleep/block/wait) and there are additional work items to process, then an additional thread will be created to keep the active count at 8. If you see 9 threads, that means that you are introducing virtually no blocking in the methods on your completion port threads and actually doing CPU work with them.
If you have 8 threads actively doing CPU bound work on 8 cores, then adding more threads will only slow things down (context switching between threads will be the wasted time).
What you should be looking in to is why you have 120 other threads and what those are doing.