I downloaded WinDDK and am using ndisprot 5x to broadcast raw ethernet packets from my user app, specifying destination MAC all 0xff's, on large and repetitive data sets it doesn't seem to be very productive.
What currently works great is a loopback - specifying destination and source MAC's as my own I get needed speed, but the packet never leaves my network card.
Maybe I am missing some ndis driver options and wait for a broadcast to complete using this sample MS driver? All I want is the packet to be broadcasted to network and I don't really care about the delivery status and want to get rid of it as fast as possible.
Would a system having only 2 points help here? I am not sure what is causing a lag.
It's not possible to eliminate the send-completion path in kernel mode. The reason is that the network card is busy reading bytes from memory, until it finally issues a send-completion. If you didn't wait for send-completion before re-using the packet, then the network card wouldn't have had an opportunity to read the full packet. You'd end up sending corrupted data.
But, you're right that there is a big inefficiency when using the stock NDISPROT sample to send huge quantities of data. The problem is that the NDISPROT's usermode sample application writes data to kernelmode synchronously. That means that your thread begins a write (send packet), then blocks until the write (send packet) completes. (The sample is inefficient, because the point of the NDISPROT sample is to illustrate how to interoperate with NDIS in kernelmode, not to illustrate complicated techniques for user-kernel communication.)
You can vastly speed this up by using one of several techniques to issue multiple pieces of data simultaneously:
Use multithreading. Do the same thing you're doing now, except do it on multiple threads in parallel. This is pretty easy to set up, but it doesn't scale very well (to scale up to 10x traffic, you need 10x threads, and then you start to get hurt on caching issues). Plus, if your dataset must be sent in order, you need a bunch of complicated synchronization to make sure the threads issue requests in order.
Use asynchronous calls with WriteFile and OVERLAPPED data structures. This requires you to do some retooling on the usermode app. (Fortunately you don't need to touch the kernel driver, since that already supports this). With OVERLAPPED writes, you can issue multiple simultaneous writes from a single thread, then get notified when any (or all) of them completes. If you're sufficiently careful with the overlapped design, you should be able to fill a 100Mbps network link easily.
To be more explicit, this is what you currently have today:
Your app NDISPROT driver Network card The network
---------------------------------------------------------------------------------
WriteFile
. \-------> NdisProtWrite
. \-------> NdisSendPackets
. |
. (copy packet payload
. from system RAM to
. network card's buffer)
. |
. |---------------> Start sending
. NdisProtSendComplete <---------| .
WriteFile <----/ | .
returns |<--------------- Finish sending
As you can see, your usermode app is stuck in WriteFile the entire time that the network card copies the packet payload from RAM to the NIC hardware. Instead, if you use asynchronous writes to kernelmode, you'll wind up with this:
Your app NDISPROT driver Network card The network
---------------------------------------------------------------------------------
WriteFile
. \-------> NdisProtWrite
. | \-------> NdisSendPackets
WriteFile <------/ |
returns (copy packet payload
from system RAM to
network card's buffer)
|
|---------------> Start sending
NdisProtSendComplete <---------| .
Async write <--/ | .
completes |<--------------- Finish sending
In this setup, WriteFile returns more quickly, and so you have a chance to queue up another packet (or 10) while the NIC is still reading the first packet. You can use any of the usual OVERLAPPED techniques to determine when the write (send packet) has completed, and you can reuse the data buffer.
To get started with asynchronous I/O, start with this documentation. (Oops, looks like their diagrams are rotated 90° from my awesome ASCII-art...).