I'm looking for some advice on how to optimise the following process:
App reads csv file.
For each line in the file an XML message is created
Each XML message is posted to a URL via a HTTPWebRequest
This process was designed to handle low volumes of messages (up to about 200 at a time), un suprisingly thinks have changed and it is now expected to handle up to about 3000 at a time.
The code used to post the message is here:
Public Function PostXml(ByVal XML As String) As HttpStatusCode
Try
Dim Bytes As Byte() = Me.Encoding.GetBytes(XML)
Dim HTTPRequest As HttpWebRequest = DirectCast(WebRequest.Create(Me.PostURL), HttpWebRequest)
With HTTPRequest
.Method = "POST"
.ContentLength = Bytes.Length
.ContentType = "text/xml"
.Credentials = New NetworkCredential(_Settings.NTSPostUsernameCurrent, _Settings.NTSPostPasswordCurrent)
End With
Using RequestStream As Stream = HTTPRequest.GetRequestStream()
RequestStream.Write(Bytes, 0, Bytes.Length)
RequestStream.Close()
End Using
Using Response As HttpWebResponse = DirectCast(HTTPRequest.GetResponse(), HttpWebResponse)
Return Response.StatusCode
End Using
Catch ex As WebException
If ex.Message.Contains("(500) Internal Server Error") Then
Return HttpStatusCode.InternalServerError
Else
Throw
End If
End Try
Can this be optimised in terms of caching the connection used? At the moment there is a noticable delay at the line:
Using Response As HttpWebResponse
while the connection is made.
Is there a way of caching this so the same connection is used for all 3000 messages rather than a new connection being created for each message?
Any advice gratefully recieved.
**Update. Thanks for the responses. To clarify, I am currently restricted to sending multiple messages due to restrictions elsewhere in the system. There is a noticable delay in responding to the request at the other end (the receiver) but this is outside my control. I am trying to ensure that the process of sending is as efficient as possible (external factors notwithstanding).
.NET already has connection caching... if you weren't disposing of the response, you'd see that pretty quickly :) (Just to clarify, you're doing the right thing here. A bug I've seen quite often is not to have a Using
statement... which causes a problem precisely because of connection caching.)
I suspect it's not a case of making the connection, but making the request - in other words, the time is spent in areas outside your control.
I suggest you use Wireshark or Fiddler to work out where the time's actually going - might it not just be the web service itself? (Or whatever you're talking to.)
Another option is to use multiple threads to speed this up - but at that point, don't forget to increase the number of connections per host (in the connectionSettings
part of app.config, IIRC).
In your case, there are two things that are in the picture, the size of the entity body that you are posting, and the authentication method that you are using.
Also, the .NET HttpWebRequest does not send the request headers and POST body in one shot. It first sends the request headers (it adds a Expect: 100-continue header to the outgoing request). Now, if the server is ready to accept data, it should reply with "100 continue" response. Otherwise, it should send a final response, which in this case would probably be a "401 authentication denied". If the server does not send the "100 continue" within 350ms, then the cllient will go ahead and send the data.
So, in order to optimize futher, we need to know:
1) What is the authentication protocol?
2) What is the avg size of the XML body that you are posting?
3) Is the server doing any heavy handed operations with the XML? That would most probably explain why you are seeing the delay in GetResponse() rather than in GetRequestStream().
Some things to try:
1) Set Expect100Continue=false on the servicepoint.
2) If your system is doing NTLM auth, then you could try the following: 1) create a GET webrerquest with credentials to the destination server. Set a connectionGroupNname on the request. Reuse the same connection group name for subsequent requests. Since the NTLM is mainly a connection auth mechanism, the first GET request will prime the connection for you, and subsequent requests will reuse that connection.
One more thing, are you doing a multithreaded operation in which you are running simultaneous thread of the above code? In that case you are running into the connection limit issue on the client - there are a max 2 connections simultaneous per HTTP/1.1 server. You could try bumping the limit (ServicePointManager.DefaultConnectionLimit = 1000).
Finally, as others have suggested, you may need to get a network sniff using wireshark/netmon in order to see where the delay lies.
The problem here is on the object creation statements, it's very expensive to create an object and then destroy it if you are doing it thousands of times....
If you only expect to have about 3k ~ 5k at a time I'd take out the using statements and let the garbage collector do it's work.
But if you expect to have more than 5k at a time, then this definitely wouldn't be an option...
I've been working on this now on and off for a few weeks and have achieved a significant performance increase using two methods:
- I am now posting multiple messages per XML file.
- I have setup three threads, 2 threads post messages simultaneously, the 3rd archives the messages to the file system.
Using this approach I have managed to achieve a performance increase of more than 100% (sometimes as much as 200-200%).