I am in the design phase of writing a new Windows Service application that accepts TCP/IP connections for long running connections (i.e. this is not like HTTP where there are many short connections, but rather a client connects and stays connected for hours or days or even weeks).
I'm looking for ideas for the best way to design the network architecture. I'm going to need to start at least one thread for the service. I am considering using the Asynch API (BeginRecieve, etc..) since I don't know how many clients I will have connected at any given time (possibly hundreds). I definitely do not want to start a thread for each connection.
Data will primarily flow out to the clients from my server, but there will be some commands sent from the clients on occasion. This is primarily a monitoring applicaiton in which my server sends status data periodically to the clients.
Any suggestions on the best way to make this as scalable as possible? Basic workflow? Thanks.
EDIT: To be clear, i'm looking for .net based solutions (C# if possible, but any .net language will work)
BOUNTY NOTE: To be awarded the bounty, I expect more than a simple answer. I would need a working example of a solution, either as a pointer to something I could download or a short example in-line. And it must be .net and Windows based (any .net language is acceptable)
EDIT: I want to thank everyone that gave good answers. Unfortunately, I could only accept one, and I chose to accept the more well known Begin/End method. Esac's solution may well be better, but it's still new enough that I don't know for sure how it will work out.
I have upvoted all the answers I thought were good, I wish I could do more for you guys. Thanks again.
I am wondering about one thing:
Why is that? Windows could handle hundreds of threads in an application since at least Windows 2000. I've done it, it's really easy to work with if the threads don't need to be synchronized. Especially given that you're doing a lot of I/O (so you're not CPU-bound, and a lot of threads would be blocked on either disk or network communication), I don't understand this restriction.
Have you tested the multi-threaded way and found it lacking in something? Do you intend to also have a database connection for each thread (that would kill the database server, so it's a bad idea, but it's easily solved with a 3-tier design). Are you worried that you'll have thousands of clients instead of hundreds, and then you'll really have problems? (Though I'd try a thousand threads or even ten thousand if I had 32+ GB of RAM - again, given that you're not CPU bound, thread switch time should be absolutely irrelevant.)
Here is the code - to see how this looks running, go to http://mdpopescu.blogspot.com/2009/05/multi-threaded-server.html and click on the picture.
Server class:
Server main program:
Client class:
Client main program:
I've got such a server running in some of my solutions. Here is a very detail explanation of the different ways to do it in .net: Get Closer to the Wire with High-Performance Sockets in .NET
Lately I've been looking for ways to improve our code and will be looking into this: "Socket Performance Enhancements in Version 3.5" that was included specifically "for use by applications that use asynchronous network I/O to achieve the highest performance".
"The main feature of these enhancements is the avoidance of the repeated allocation and synchronization of objects during high-volume asynchronous socket I/O. The Begin/End design pattern currently implemented by the Socket class for asynchronous socket I/O requires a System.IAsyncResult object be allocated for each asynchronous socket operation."
You can keep reading if you follow the link. I personally will be testing their sample code tomorrow to benchmark it against what i've got.
Edit: Here you can find working code for both client and server using the new 3.5 SocketAsyncEventArgs so you can test it within a couple minutes and go thru the code. It is a simple approach but is the basis for starting a much larger implementation. Also this article from almost two years ago in MSDN Magazine was a interesting read.
You can find a nice overview of techniques at the C10k problem page.
You can use Push Framework open source framework for high-performance server development. It is built on IOCP and is suitable for push scenarios and message broadcast.
http://www.pushframework.com
You already got the most part of the answer via the code samples above. Using asynchronous IO operation is absolutely the way to go here. Async IO is the way the Win32 is designed internally to scale. The best possible performance you can get is achieved using Completion Ports, binding your sockets to completion ports and have a thread pool waiting for completion port completion. The common wisdom is to have 2-4 threads per CPU(core) waiting for completion. I highly recommend to go over these three articles by Rick Vicik from the Windows Performance team:
The said articles cover mostly the native Windows API, but they are a must read for anyone trying to get a grasp at scalability and performance. They do have some briefs on the managed side of things too.
Second thing you'll need to do is make sure you go over the Improving .NET Application Performance and Scalability book, that is available online. You will find pertinent and valid advice around the use of threads, asynchronous calls and locks in Chapter 5. But the real gems are in Chapter 17 where you'll find such goodies as practical guidance on tuning your thread pool. My apps had some serious problems until I adjusted the maxIothreads/maxWorkerThreads as per the recommendations in this chapter.
You say that you want to do a pure TCP server, so my next point is spurious. However, if you find yourself cornered and use the WebRequest class and its derivatives, be warned that there is a dragon guarding that door: the ServicePointManager. This is a configuration class that has one purpose in life: to ruin your performance. Make sure you free your server from the artificial imposed ServicePoint.ConnectionLimit or your application will never scale (I let you discover urself what is the default value...). You may also reconsider the default policy of sending an Expect100Continue header in the http requests.
Now about the core socket managed API things are fairly easy on the Send side, but they are significantly more complex on the Receive side. In order to achieve high throughput and scale you must ensure that the socket is not flow controlled because you do not have a buffer posted for receive. Ideally for high performance you should post ahead 3-4 buffers and post new buffers as soon as you get one back (before you process the one got back) so you ensure that the socket always has somewhere to deposit the data coming from the network. You'll see why you probably won't be able to achieve this shortly.
After you're done playing with the BeginRead/BeginWrite API and start the serious work you'll realize that you need security on your traffic, ie. NTLM/Kerberos authentication and traffic encryption, or at least traffic tampering protection. The way you do this is you use the built in System.Net.Security.NegotiateStream (or SslStream if you need to go cross disparate domains). This means that instead of relying on straight socket asynchronous operations you will rely on the AuthenticatedStream asynchronous operations. As soon as you obtain a socket (either from connect on client or from accept on server) you create a stream on the socket and submit it for authentication, by calling either BeginAuthenticateAsClient or BeginAuthenticateAsServer. After the authentication completes (at least your safe from the native InitiateSecurityContext/AcceptSecurityContext madness...) you will do your authorization by checking the RemoteIdentity property of your Authenticated stream and doing whatever ACL verification your product must support. After that you will send messages using the BeginWrite and you'll be receiving them with BeginRead. This is the problem I was talking before that you won't be able to post multiple receive buffers, because the AuthenticateStream classes don't support this. The BeginRead operation manages internally all the IO until you have received an entire frame, otherwise it could not handle the the message authentication (decrypt frame and validate signature on frame). Though in my experience the job done by the AuthenticatedStream classes is fairly good and shouldn't have any problem with it. Ie. you should be able to saturate GB network with only 4-5% CPU. The AuthenticatedStream classes will also impose on you the protocol specific frame size limitations (16k for SSL, 12k for Kerberos).
This should get you started on the right track. I'm not going to post code here, there is a perfectly good example on MSDN. I've done many projects like this and I was able to scale to about 1000 users connected without problems. Above that you'll need to modify registry keys to allow the kernel for more socket handles. and make sure you deploy on a server OS, that is W2K3 not XP or Vista (ie. client OS), it makes a big difference.
BTW make sure if you have databases operations on the server or file IO you also use the async flavor for them, or you'll drain the thread pool in no time. For SQL Server connections make sure you add the 'Asyncronous Processing=true' to the connection string.
You are not going to get the highest level of scalability if you go purely with .NET. GC pauses can hamper the latency.
Overlapped IO is generally considered to be Windows' fastest API for network communication. I don't know if this the same as your Asynch API. Do not use select as each call needs to check every socket that is open instead of having callbacks on active sockets.