I want retrieve a list of unique customer Ids from a simple XML file (see below), using Task Parallel Library (TPL).
I use XPathNavigator to iterate through xml and retrieve customer Ids. I’m using an iterator with the Parallel.ForEach(..) for task parallelism.
For some reason I retrieve duplicated customer Ids. It almost seems like the iterator keeping track of previous reads/iteratoes. I’m expecting new iterator each time when I loop through.
I have tried number of ways still no luck. If someone can point me to the right direction it would be greatly appreciated.
(The attempted full code sample is below.)
Some simple XML:
private static string Xml()
{
return "<persons>" +
"<person><id>1</id></person>" +
"<person><id>2</id></person>" +
"<person><id>3</id></person>" +
"<person><id>4</id></person>" +
"<person><id>5</id></person>" +
"</persons>";
}
static void Main(string[] args)
{
var navigator = XmlHelper.CreateNavigator(Xml());
string xpath = "/persons/person";
var exp = navigator.Compile(xpath);
var iterator = navigator.Select(exp);
//Parallel Task scenario returns duplicated customer Ids
Parallel.ForEach(Iterate(iterator), (a) =>
{
string xpathId = "/person/id";
var value = XmlHelper.SelectString(a.Current, xpathId);
Console.WriteLine("person id: " + value);
});
/*
* Sample output can be: (notice the duplicated values!)
* person id: 2
* person id: 2
* person id: 4
* person id: 4
* person id: 3
* person id: 1
*
*/
//Sequential scenario displays unique values:
//while (iterator.MoveNext())
//{
// string xpathId = "/person/id";
// var value = XmlHelper.SelectString(iterator.Current, xpathId);
// Console.WriteLine("person id: " + value);
//}
Console.ReadLine();
}
private static IEnumerable<XPathNodeIterator>
Iterate(XPathNodeIterator iterator)
{
while (iterator.MoveNext())
{
yield return iterator;
}
}
public static class XmlHelper
{
public static string SelectString(XPathNavigator navigator, string xpath)
{
return SelectString(navigator, xpath, null);
}
public static string SelectString
(XPathNavigator navigator, string xpath, string defaultVal)
{
XPathExpression exp = navigator.Compile(xpath);
XPathNodeIterator it = navigator.Select(exp);
it.MoveNext();
return it.Current.Value;
}
public static XPathNavigator CreateNavigator(string input)
{
XPathDocument doc;
using (var reader = new StringReader(input))
{
doc = new XPathDocument(reader);
}
return doc.CreateNavigator();
}
}
Note I have also the approach take by this article still no luck. Any help greatly appreciated.
From MSDN:
https://msdn.microsoft.com/en-us/library/system.xml.xpath.xpathnavigator(v=vs.110).aspx
So your iterator is not thread safe for use like this.
Thanks @Natram and @Paddy!
Both answers pointed me to the right direction. I think @Nitram’s answer was more accurate as he has explained the problem I had it in the first place.
It seems running in parallel, the below code was still causing some duplicates. This is not obvious for smaller collections, but when the number becomes larger it tend to repeat values in multi threaded environments.
I believe this is why @Paddy mentioned the Iterator is not thread safe.
@Ntram mentioned:
Based on this I went on converting the Iterator to return a list of XPathNaviagator Enumerables
This solved the problem I had and it worked effectively with the number of items I'm expected to parallelize.
The root of your problem is this function:
If you think about this function you come to the conclusion that there is something very wrong with it.
What this function actually does is: It gives you a Iterator that gives you
n
times the reference to one iterator. Wheren
is the amount of elements in the iterator applied as property.This messes up everything.
Parallel.ForEach
is easily able to handle Enumerables, but what your function does is applying one iterator multiple times.I think what you tried to do, is to "convert" your Iterator into a
IEnumerable
. But you need aIEnumerable
that gives you the values of the iterator and not the iterator over and over again.So all in all your function should look like this:
This way your enumerable actually contains the values of your iterator and returns this. With this function you will get all entries in your loop.