The company I work for runs a few hundred very dynamic web sites. It has decided to build a search engine and I was tasked with writing the scraper. Some of the sites run on old hardware and are not able to take much punishment, while others can handle massive amount of simultaneous users.
I need to be able to say use 5 parallel requests for site A, 2 for site B and 1 for site C.
I know I can use threads, mutexes, semaphores, etc. to accomplish this, but it will be quite complicated. Are any of the higher level frameworks, like TPL, await/async, TPL Dataflow powerful enough to do this app in a simpler manner?
TPL Dataflow
andasync-await
are indeed powerful and simple enough to be able to just what you need:I recommend you use
HttpClient
withTask.WhenAll
, withSemaphoreSlim
for simple throttling:Alternatively, you could use TPL Dataflow and set
MaxDegreeOfParallelism
for the throttling.