I want to download webPages content of url list (10 000 urls).
- Is httpCLient the fastest and cleanest way (instead httpwebrequest, or webclient)?
- If I want to be fast, Is TPL the best way ?
I'm looking for something like, but really fast and clean (10 000 request) ?
public List<string> GetContentListOfUrlList(List<Uri> uriList, int maxSimultaneousRequest)
{
//requesting url by the fastest way
}
I hope is better like this ;)
EDIT 2 : According to noseratio other post Is the best solution ?
public async Task<List<string>> DownloadAsync(List<Uri> urls, int maxDownloads)
{
var concurrentQueue = new ConcurrentQueue<string>();
using (var semaphore = new SemaphoreSlim(maxDownloads))
using (var httpClient = new HttpClient())
{
var tasks = urls.Select(async (url) =>
{
await semaphore.WaitAsync();
try
{
var data = await httpClient.GetStringAsync(url);
concurrentQueue.Enqueue(data);
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(tasks.ToArray());
}
return concurrentQueue.ToList();
}
Questions
configureAwait? Should I use
var data = await httpClient.GetStringAsync(url).ConfigureAwait(false);
var data = await httpClient.GetStringAsync(url);
- ServicePointManager.DefaultConnectionLimit? Should I change this property as well?
There is a ParallelOptions.MaxDegreeOfParallelism Property which specifies the maximum number of concurrent operations:
Reference: MaxDegreeOfParallism