Parallelism with Task, some tasks worked, some not

2019-09-19 14:21发布

问题:

I have a website and I write a HttpModule to convert all links, So every things fine until I going to use parallelism in convert URLs.

This is My Test Console Application:

class Program
    {
        static void Main(string[] args)
        {
            new Job().Do();
        }
    }

    public class Job
    {
        public void Do()
        {
            string content = @"
            new link1 href=""www.yahoo1.com"" end
            new link2 href=""www.yahoo2.com"" end
            new link3 href=""www.yahoo3.com"" end
            new link4 href=""www.yahoo4.com"" end
            new link5 href=""www.yahoo5.com"" end
            new link6 href=""www.yahoo6.com"" end
            ";

            string newcontent = Transformlink(content);

            Console.WriteLine(content);
            Console.WriteLine();
            Console.WriteLine(newcontent);
            Console.ReadLine();
        }

        private string Transformlink(string content)
        {
            List<UrlIndex> AllUrls = GetUrls(content);
            List<Task> TaskPool = new List<Task>();
            foreach (UrlIndex Item in AllUrls)
                TaskPool.Add(Task.Factory.StartNew(() => TransformUrl(Item)));
            Task.WaitAll(TaskPool.ToArray());

            return ReplaceUrlWithTransformUrl(content, AllUrls);
        }

        private string ReplaceUrlWithTransformUrl(string content, List<UrlIndex> AllUrls)
        {
            for (int i = AllUrls.Count - 1; i >= 0; i--)
            {
                UrlIndex CurrentItem = AllUrls[i];
                content = content.Substring(0, CurrentItem.StartIndex) + CurrentItem.TransformedUrl + content.Substring(CurrentItem.EndIndex);
            }
            return content;
        }

        private void TransformUrl(UrlIndex urlindex)
        {
            urlindex.TransformedUrl = string.Format("Google{0}.com", new Random().Next(100, 999).ToString());
        }

        private List<UrlIndex> GetUrls(string content)
        {
            //Get Start And End Index, Get Url Set TransformedUrl = Url
            List<UrlIndex> AllUrls = new List<UrlIndex>();
            int startindex = 0;
            int endIndex = 0;
            int previousindex = 0;
            while (startindex != -1)
            {
                startindex = content.IndexOf("href=\"", previousindex);
                if (startindex == -1)
                    break;
                startindex += 6;
                previousindex = startindex;
                endIndex = content.IndexOf("\"", previousindex);
                if (endIndex == -1)
                    break;
                previousindex = endIndex;
                string url = content.Substring(startindex, endIndex - startindex);
                AllUrls.Add(new UrlIndex() { StartIndex = startindex, EndIndex = endIndex, Url = url, TransformedUrl = url });
            }

            return AllUrls;
        }
    }


    public class UrlIndex
    {
        public int StartIndex { get; set; }
        public int EndIndex { get; set; }
        public string Url { get; set; }
        public string TransformedUrl { get; set; }
    }

The result must be:

new link1 href=""www.Google859.com"" end
new link2 href=""www.Google245.com"" end
new link3 href=""www.Google749.com"" end
new link4 href=""www.Google345.com"" end
new link5 href=""www.Google894.com"" end
new link6 href=""www.Google243.com"" end

Thats the exact thing I Want.

But result is:

new link1 href=""www.yahoo1.com"" end
new link2 href=""www.yahoo2.com"" end
new link3 href=""www.yahoo3.com"" end
new link4 href=""www.yahoo4.com"" end
new link5 href=""www.yahoo5.com"" end
new link6 href=""www.Google125.com"" end

As you see just the last link transformed. And in Some cases:

new link1 href=""www.yahoo1.com"" end
new link2 href=""www.yahoo2.com"" end
new link3 href=""www.Google285.com"" end
new link4 href=""www.yahoo4.com"" end
new link5 href=""www.yahoo5.com"" end
new link6 href=""www.Google125.com"" end

Console project is in .NET 4

Is this my fault? Why all tasks not worked? The line of Task.WaitAll(TaskPool.ToArray()); is not enough? any suggestion?

回答1:

Looks like a closure problem. Change your Transformlink method like this:

    private string Transformlink(string content)
    {
        List<UrlIndex> AllUrls = GetUrls(content);
        List<Task> TaskPool = new List<Task>();
        foreach (UrlIndex Item in AllUrls)
        {
            val localItem = Item;
            TaskPool.Add(Task.Factory.StartNew(() => TransformUrl(localItem)));
        }
        Task.WaitAll(TaskPool.ToArray());

        return ReplaceUrlWithTransformUrl(content, AllUrls);
    }

Edit/ Explanation:

This is coused by the way Tasks are scheduled. You have no real way to control that. In your control application task execution is scheduled "fast enough" to finish before you loop iteration. Because of that Item variable that you passed inside TransformUrl is still that one you thought about.

But in your server application loop finishes before any Task is executed. And note that you passed a reference. This reference is changed in each iteration. So after loop finished all your tasks will perform transform on the same UrlIndex instance. And that is what happens. By creating local variable you store reference to the actual object that you wanted to use.

So using the local variable is the right way to do this. It works in console app because of right timing conditions (I would call it luck :) )