I am trying to parse data from several websites continuously. I would like this action to be preformed individually in a loop in an asynchronous manner until the program is closed. I am not sure what the structure should be for this kind of logic.
Right now I am following this pattern.
async public void ParseAll(List<Site> SiteList)
{
List<Task> TaskList = new List<Task>();
foreach(Site s in SiteList)
{
TaskList.Add(s.ParseData);
}
await Task.WhenAll(TaskList)
}
The issue is that if I construct a Loop around this method then the sites that are updated first will have to wait until the whole list is finished before the method can run again. Theoretically, what I would like to do is just put each site back on the bottom of the TaskList
when it finished its ParseData
method but I am not sure if thats possible, or if thats the best way.
If you want to visit the site again as soon as it is complete, you probably want to use Task.WhenAny and integrate your outer loop with your inner loop, something like this (assuming the ParseData function will return the Site it is parsing for):
Did you tried the PLinq lib?
Plinq allows you to execute linq querys async.
In your case it would look like:
SiteList.
AsParallel()
.ForEach
(s => s.ParseData);
It's easy enough to create a method to loop continuously and parse a single site over and over again. Once you have that method, you can call it once on each site in the list:
Looks like you need to maintain a queue of sites to be processed. Below is my take on this, using
SemaphoreSlim
. This way you can also limit the number of concurrent tasks to be less than the actual number of sites, or add new sites on-the-fly. ACancellationToken
is used to stop the processing from outside. The use ofasync void
is justified here IMO,QueueSiteAsync
keeps track of the tasks it starts.You may also want to separate download and parsing into separate pipelines, check this for more details.