Web客户端作为任务(TPL)的多个并行执行(multiple parallel execution

2019-10-17 10:28发布

我测试的并行执行IWebDriver VS WebClient 。 (如果有性能迪菲昂斯和它有多大)

之前,我成功地做到这一点,我有问题,用简单的WebClient-并行调用。

看来,它并没有被执行,我也放了刹车点上AgilityPacDocExtraction在特定行WebClient.DownloadString(URL)

但程序退出调试,而不是Step Into可能显示yeald字符串。

该计划是具有用于需要采取的所有行动单一的方法,通过对每个动作的“模式”选择器,然后使用一个简单foreach将迭代上所有可用的Enum values -模式

主要exeutions:

   static void Main(string[] args)
   {
        EnumForEach<Action>(Execute);
        Task.WaitAll();
   }
   public static void EnumForEach<Mode>(Action<Mode> Exec)
   {

            foreach (Mode mode in Enum.GetValues(typeof(Mode)))
            {
                Mode Curr = mode;

                Task.Factory.StartNew(() => Exec(Curr) );
            }

   }

模式/动作选择

    enum Action
    {
        Act1, Act2
    }

实际执行

    static  BrowsresFactory.IeEngine IeNgn = new BrowsresFactory.IeEngin();
    static string 
        FlNm = Environment.CurrentDirectory,
        URL = "",
        TmpHtm ="";


   static void Execute(Action Exc)
   {


        switch (Exc)
        {
            case Action.Act1:
                break;

            case Action.Act2:
                URL  = "UrlofUrChoise here...";
                FlNm += "\\TempHtm.htm";
                TmpHtm = IeNgn.AgilityPacDocExtraction(URL).GetElementbyId("Dv_Main").InnerHtml;
                File.WriteAllText(FlNm, TmpHtm);
                break;

        }
     }

持有类WebClientIWebDriver (硒),这里不包括,所以它不会采取这个帖子一些更多的空间和allso没有相应和现在。

class BrowsresFactory
{
    public class IeEngine
{

    private WebClient WC = new WebClient();
    private string tmpExtractedPageValue = "";
    private HtmlAgilityPack.HtmlDocument retAglPacHtmDoc = new HtmlAgilityPack.HtmlDocument();

    public HtmlAgilityPack.HtmlDocument AgilityPacDocExtraction(string URL)
    {
                WC.Encoding = Encoding.GetEncoding("UTF-8");
                tmpExtractedPageValue = WC.DownloadString(URL); //<--- tried to break here
                retAglPacHtmDoc.LoadHtml(tmpExtractedPageValue);
                return retAglPacHtmDoc;
    }
}
}

的问题是,我不能看到这应该通过从Web客户端提取的值,加上当在调试模式下我不能踏入在上面的代码注释行被alterd文件中的任何内容。 我到底做错了什么?

Answer 1:

功能Download(url, htmlDictionary)未在上面的代码中定义的,一个可能的版本是:

private static void Download(string url, ConcurrentDictionary<string, string> htmlDictionary)
{
    using (var webClient = new SmartWebClient())
    {
        htmlDictionary.TryAdd(url, webClient.DownloadString(url));
    }
}

...上面的代码似乎从另一个堆栈溢出后的副本。 参考文献见获取包含使用任务并行HTML文档源的字符串



Answer 2:

我设法通过制造使用的解决问题WebClient我认为需要的不仅仅是资源少WebDriver和如果那是真的,那也意味着,花费较少的时间。

这是代码:

public void StartEngins()
{
    const string URL_Dollar = "URL_Dollar";
    const string URL_UpdateUsersTimeOut = "URL_UpdateUsersTimeOut";


    var urlList = new Dictionary<string, string>();
    urlList.Add(URL_Dollar, "http://bing.com");
    urlList.Add(URL_UpdateUsersTimeOut, "http://localhost:..../.......aspx");


    var htmlDictionary = new ConcurrentDictionary<string, string>();
    Parallel.ForEach(
                    urlList.Values,
                    new ParallelOptions { MaxDegreeOfParallelism = 20 },
                    url => Download(url, htmlDictionary)
                    );
    foreach (var pair in htmlDictionary)
    {
        ///Process(pair);
        MessageBox.Show(pair.Value);
    }
}

public class SmartWebClient : WebClient
{
    private readonly int maxConcurentConnectionCount;

    public SmartWebClient(int maxConcurentConnectionCount = 20)
    {

        this.maxConcurentConnectionCount = maxConcurentConnectionCount;
    }

    protected override WebRequest GetWebRequest(Uri address)
    {
        var httpWebRequest = (HttpWebRequest)base.GetWebRequest(address);
        if (httpWebRequest == null)
        {
            return null;
        }

        if (maxConcurentConnectionCount != 0)
        {
            httpWebRequest.ServicePoint.ConnectionLimit = maxConcurentConnectionCount;
        }

        return httpWebRequest;
    }

}


文章来源: multiple parallel execution of WebClient as Task (TPL)