I am in the design phase of writing a new Windows Service application that accepts TCP/IP connections for long running connections (i.e. this is not like HTTP where there are many short connections, but rather a client connects and stays connected for hours or days or even weeks).
I'm looking for ideas for the best way to design the network architecture. I'm going to need to start at least one thread for the service. I am considering using the Asynch API (BeginRecieve, etc..) since I don't know how many clients I will have connected at any given time (possibly hundreds). I definitely do not want to start a thread for each connection.
Data will primarily flow out to the clients from my server, but there will be some commands sent from the clients on occasion. This is primarily a monitoring applicaiton in which my server sends status data periodically to the clients.
Any suggestions on the best way to make this as scalable as possible? Basic workflow? Thanks.
EDIT: To be clear, i'm looking for .net based solutions (C# if possible, but any .net language will work)
BOUNTY NOTE: To be awarded the bounty, I expect more than a simple answer. I would need a working example of a solution, either as a pointer to something I could download or a short example in-line. And it must be .net and Windows based (any .net language is acceptable)
EDIT: I want to thank everyone that gave good answers. Unfortunately, I could only accept one, and I chose to accept the more well known Begin/End method. Esac's solution may well be better, but it's still new enough that I don't know for sure how it will work out.
I have upvoted all the answers I thought were good, I wish I could do more for you guys. Thanks again.
There are many ways of doing network operations in C#. All of them use different mechanisms under the hood, and thus suffer major performance issues with a high concurrency. Begin* operations are one of these that many people often mistake for being the faster/fastest way of doing networking.
To solve these issues, they introduced the *Async set of methods: From MSDN http://msdn.microsoft.com/en-us/library/system.net.sockets.socketasynceventargs.aspx
The SocketAsyncEventArgs class is part of a set of enhancements to the System.Net.Sockets..::.Socket class that provide an alternative asynchronous pattern that can be used by specialized high-performance socket applications. This class was specifically designed for network server applications that require high performance. An application can use the enhanced asynchronous pattern exclusively or only in targeted hot areas (for example, when receiving large amounts of data).
The main feature of these enhancements is the avoidance of the repeated allocation and synchronization of objects during high-volume asynchronous socket I/O. The Begin/End design pattern currently implemented by the System.Net.Sockets..::.Socket class requires a System..::.IAsyncResult object be allocated for each asynchronous socket operation.
Under the covers, the *Async API uses IO completion ports which is the fastest way of performing networking operations, see http://msdn.microsoft.com/en-us/magazine/cc302334.aspx
And just to help you out, I am including the source code for a telnet server I wrote using the *Async API. I am only including the relevant portions. Also to note, instead of processing the data inline, I instead opt to push it onto a lock free (wait free) queue that is processed on a separate thread. Note that I am not including the corresponding Pool class which is just a simple pool which will create a new object if it is empty, and the Buffer class which is just a self-expanding buffer which is not really needed unless you are receiving an indeterministic amount of data. If you would like anymore information, feel free to send me a PM.
I would recommend to read these books on ACE
to get ideas about patterns allowing you to create an efficient server.
Although ACE is implemented in C++ the books cover a lot of useful patterns that can be used in any programming language.
There used to be a really good discussion of scalable TCP/IP using .NET written by Chris Mullins of Coversant, unfortunately it appears his blog has disappeared from its prior location, so I will try to piece together his advice from memory (some useful comments of his appear in this thread: C++ vs. C#: Developing a highly scalable IOCP server)
First and foremost, note that both using
Begin/End
and theAsync
methods on theSocket
class make use of IO Completion Ports (IOCP) to provide scalability. This makes a much bigger difference (when used correctly; see below) to scalability than which of the two methods you actually pick to implement your solution.Chris Mullins' posts were based on using
Begin/End
, which is the one I personally have experience with. Note that Chris put together a solution based on this that scaled up to 10,000s of concurrent client connections on a 32-bit machine with 2GB of memory, and well into 100,000s on a 64-bit platform with sufficient memory. From my own experience with this technique (altho nowhere near this kind of load) I have no reason to doubt these indicative figures.IOCP versus thread-per-connection or 'select' primitives
The reason you want to use a mechanism that uses IOCP under the hood is that it uses a very low-level Windows thread pool that does not wake up any threads until there is actual data on the IO channel that you are trying to read from (note that IOCP can be used for file IO as well). The benefit of this is that Windows does not have to switch to a thread only to find that there is no data yet anyway, so this reduces the number of context switches your server will have to make to the bare minimum required.
Context switches is what will definitely kill the 'thread-per-connection' mechanism, although this is a viable solution if you are only dealing with a few dozen connections. This mechanism is however by no stretch of the imagination 'scalable'.
Important considerations when using IOCP
Memory
First and foremost it is critical to understand that IOCP can easily result in memory issues under .NET if your implementation is too naive. Every IOCP
BeginReceive
call will result in "pinning" of the buffer you are reading into. For a good explanation of why this is a problem, see: Yun Jin's Weblog: OutOfMemoryException and Pinning.Luckily this problem can be avoided, but it requires a bit of a trade-off. The suggested solution is to allocate a big
byte[]
buffer at application start-up (or close thereto), of at least 90KB or-so (as of .NET 2, required size may be larger in later versions). The reason to do this is that large memory allocations automatically end up in a non-compacting memory segment (The Large Object Heap) that is effectively automatically pinned. By allocating one large buffer at start-up you make sure that this block of unmovable memory is at a relatively 'low address' where it will not get in the way and cause fragmentation.You then can use offsets to segment this one big buffer into separate areas for each connection that needs to read some data. This is where a trade-off comes into play; since this buffer needs to be pre-allocated, you will have to decide how much buffer space you need per connection, and what upper limit you want to set on the number of connections you want to scale to (or, you can implement an abstraction that can allocate additional pinned buffers once you need them).
The simplest solution would be to assign every connection a single byte at a unique offset within this buffer. Then you can make a
BeginReceive
call for a single byte to be read, and perform the rest of the reading as a result of the callback you get.Processing
When you get the callback from the
Begin
call you made, it is very important to realise that the code in the callback will execute on the low-level IOCP thread. It is absolutely essential that you avoid lengthy operations in this callback. Using these threads for complex processing will kill your scalability just as effectively as using 'thread-per-connection'.The suggested solution is to use the callback only to queue up a work item to process the incoming data, that will be executed on some other thread. Avoid any potentially blocking operations inside the callback so that the IOCP thread can return to its pool as quickly as possible. In .NET 4.0 I'd suggest the easiest solution is to spawn a
Task
, giving it a reference to the client socket and a copy of the first byte that was already read by theBeginReceive
call. This task is then responsible for reading all data from the socket that represent the request you are processing, executing it, and then making a newBeginReceive
call to queue the socket for IOCP once more. Pre .NET 4.0, you can use the ThreadPool, or create your own threaded work-queue implementation.Summary
Basically, I'd suggest using Kevin's sample code for this solution, with the following added warnings:
BeginReceive
is already 'pinned'BeginReceive
does nothing more than queue up a task to handle the actual processing of the incoming dataWhen you do that, I have no doubt you could replicate Chris' results in scaling up to potentially hundreds of thousands of simultaneous clients (given the right hardware and an efficient implementation of your own processing code ofcourse ;)
I would use SEDA or a lightweight threading library (erlang or newer linux see NTPL scalability on the server side). Async coding is very cumbersome if your communication isn't :)
to people copy pasting the accepted answer, you can rewrite the acceptCallback method, removing all calls of _serverSocket.BeginAccept(new AsyncCallback(acceptCallback), _serverSocket); and put it in a finally{} clause, this way:
you could even remove the first catch since its content is the same but it's a template method and you should use typed exception to better handle the exceptions and understand what caused the error, so just implement those catches with some useful code
I've written something similar to this in the past. From my research years ago showed that writing your own socket implementation was the best bet, using the Asynchronous sockets. This meant that clients not really doing anything actually required relatively little resources. Anything that does occur is handled by the .net thread pool.
I wrote it as a class that manages all connections for the servers.
I simply used a list to hold all the client connections, but if you need faster lookups for larger lists, you can write it however you want.
Also you need the socket actually listenning for incomming connections.
The start method actually starts the server socket and begins listening for any incomming connections.
I'd just like to note the exception handling code looks bad, but the reason for it is I had exception suppression code in there so that any exceptions would be suppressed and return
false
if a config option was set, but I wanted to remove it for brevity sake.The _serverSocket.BeginAccept(new AsyncCallback(acceptCallback)), _serverSocket) above essentially sets our server socket to call the acceptCallback method whenever a user connects. This method runs from the .Net threadpool, which automatically handles creating additional worker threads if you have many blocking operations. This should optimally handle any load on the server.
The above code essentially just finished accepting the connection that comes in, queues
BeginReceive
which is a callback that will run when the client sends data, and then queues the nextacceptCallback
which will accept the next client connection that comes in.The
BeginReceive
method call is what tells the socket what to do when it receives data from the client. ForBeginReceive
, you need to give it a byte array, which is where it will copy the data when the client sends data. TheReceiveCallback
method will get called, which is how we handle receiving data.EDIT: In this pattern I forgot to mention that in this area of code:
What I would generally do is in the whatever you want code, is do reassembly of packets into messages, and then create them as jobs on the thread pool. This way the BeginReceive of the next block from the client isn't delayed while whatever message processing code is running.
The accept callback finishes reading the data socket by calling end receive. This fills the buffer provided in the begin receive function. Once you do whatever you want where I left the comment, we call the next
BeginReceive
method which will run the callback again if the client sends any more data. Now here's the really tricky part, when the client sends data, your receive callback might only be called with part of the message. Reassembly can become very very complicated. I used my own method and created a sort of proprietary protocol to do this. I left it out, but if you request, I can add it in. This handler was actually the most complicated piece of code I had ever written.The above send method actually uses a synchronous
Send
call, for me that was fine due to the message sizes and the multithreaded nature of my application. If you want to send to every client, you simply need to loop through the _sockets List.The xConnection class you see referenced above is basically a simple wrapper for a socket to include the byte buffer, and in my implementation some extras.
Also for reference here are the
using
s I include since I always get annoyed when they aren't included.I hope that's helpful, it may not be the cleanest code, but it works. There are also some nuances to the code which you should be weary about changing. For one, only have a single
BeginAccept
called at any one time. There used to be a very annoying .net bug around this, which was years ago so I don't recall the details.Also, in the
ReceiveCallback
code, we process anything received from the socket before we queue the next receive. This means that for a single socket, we're only actually ever inReceiveCallback
once at any point in time, and we don't need to use thread synchronization. However, if you reorder this to call the next receive immediately after pulling the data, which might be a little faster, you will need to make sure you properly synchronize the threads.Also, I hacked out alot of my code, but left the essence of what's happening in place. This should be a good start for you're design. Leave a comment if you have any more questions around this.