I am comparing two xml and I have to print the difference. How can I achieve this using LINQ.
I know I can use XML diff patch by Microsoft but I prefer to use LINQ . If you have any other idea I will implement that
//First Xml
<Books>
<book>
<id="20504" image="C01" name="C# in Depth">
</book>
<book>
<id="20505" image="C02" name="ASP.NET">
</book>
<book>
<id="20506" image="C03" name="LINQ in Action ">
</book>
<book>
<id="20507" image="C04" name="Architecting Applications">
</book>
</Books>
//Second Xml
<Books>
<book>
<id="20504" image="C011" name="C# in Depth">
</book>
<book>
<id="20505" image="C02" name="ASP.NET 2.0">
</book>
<book>
<id="20506" image="C03" name="LINQ in Action ">
</book>
<book>
<id="20508" image="C04" name="Architecting Applications">
</book>
</Books>
I want to compare this two xml and print result like this.
Issued Issue Type IssueInFirst IssueInSecond
1 image is different C01 C011
2 name is different ASP.NET ASP.NET 2.0
3 id is different 20507 20508
Here is the solution:
//sanitised xmls:
string s1 = @"<Books>
<book id='20504' image='C01' name='C# in Depth'/>
<book id='20505' image='C02' name='ASP.NET'/>
<book id='20506' image='C03' name='LINQ in Action '/>
<book id='20507' image='C04' name='Architecting Applications'/>
</Books>";
string s2 = @"<Books>
<book id='20504' image='C011' name='C# in Depth'/>
<book id='20505' image='C02' name='ASP.NET 2.0'/>
<book id='20506' image='C03' name='LINQ in Action '/>
<book id='20508' image='C04' name='Architecting Applications'/>
</Books>";
XDocument xml1 = XDocument.Parse(s1);
XDocument xml2 = XDocument.Parse(s2);
//get cartesian product (i think)
var result1 = from xmlBooks1 in xml1.Descendants("book")
from xmlBooks2 in xml2.Descendants("book")
select new {
book1 = new {
id=xmlBooks1.Attribute("id").Value,
image=xmlBooks1.Attribute("image").Value,
name=xmlBooks1.Attribute("name").Value
},
book2 = new {
id=xmlBooks2.Attribute("id").Value,
image=xmlBooks2.Attribute("image").Value,
name=xmlBooks2.Attribute("name").Value
}
};
//get every record that has at least one attribute the same, but not all
var result2 = from i in result1
where (i.book1.id == i.book2.id
|| i.book1.image == i.book2.image
|| i.book1.name == i.book2.name) &&
!(i.book1.id == i.book2.id
&& i.book1.image == i.book2.image
&& i.book1.name == i.book2.name)
select i;
foreach (var aa in result2)
{
//you do the output :D
}
Both linq statements probably could be merged, but I leave that as an exercise for you.
The operation you want here is a Zip to pair up corresponding elements in your two sequences of books. That operator is being added in .NET 4.0, but we can fake it by using Select to grab the books' indices and joining on that:
var res = from b1 in xml1.Descendants("book")
.Select((b, i) => new { b, i })
join b2 in xml2.Descendants("book")
.Select((b, i) => new { b, i })
on b1.i equals b2.i
We'll then use a second join to compare the values of attributes by name. Note that this is an inner join; if you did want to include attributes missing from one or the other you would have to do quite a bit more work.
select new
{
Row = b1.i,
Diff = from a1 in b1.b.Attributes()
join a2 in b2.b.Attributes()
on a1.Name equals a2.Name
where a1.Value != a2.Value
select new
{
Name = a1.Name,
Value1 = a1.Value,
Value2 = a2.Value
}
};
The result will be a nested collection:
foreach (var b in res)
{
Console.WriteLine("Row {0}: ", b.Row);
foreach (var d in b.Diff)
Console.WriteLine(d);
}
Or to get multiple rows per book:
var report = from r in res
from d in r.Diff
select new { r.Row, Diff = d };
foreach (var d in report)
Console.WriteLine(d);
Which reports the following:
{ Row = 0, Diff = { Name = image, Value1 = C01, Value2 = C011 } }
{ Row = 1, Diff = { Name = name, Value1 = ASP.NET, Value2 = ASP.NET 2.0 } }
{ Row = 3, Diff = { Name = id, Value1 = 20507, Value2 = 20508 } }
For fun, a general solution to grega g's reading of the problem. To illustrate my objection to this approach, I've introduced a "correct" entry for 'PowerShell in Action'.
string s1 = @"<Books>
<book id='20504' image='C01' name='C# in Depth'/>
<book id='20505' image='C02' name='ASP.NET'/>
<book id='20506' image='C03' name='LINQ in Action '/>
<book id='20507' image='C04' name='Architecting Applications'/>
<book id='20508' image='C05' name='PowerShell in Action'/>
</Books>";
string s2 = @"<Books>
<book id='20504' image='C011' name='C# in Depth'/>
<book id='20505' image='C02' name='ASP.NET 2.0'/>
<book id='20506' image='C03' name='LINQ in Action '/>
<book id='20508' image='C04' name='Architecting Applications'/>
<book id='20508' image='C05' name='PowerShell in Action'/>
</Books>";
XDocument xml1 = XDocument.Parse(s1);
XDocument xml2 = XDocument.Parse(s2);
var res = from b1 in xml1.Descendants("book")
from b2 in xml2.Descendants("book")
let issues = from a1 in b1.Attributes()
join a2 in b2.Attributes()
on a1.Name equals a2.Name
select new
{
Name = a1.Name,
Value1 = a1.Value,
Value2 = a2.Value
}
where issues.Any(i => i.Value1 == i.Value2)
from issue in issues
where issue.Value1 != issue.Value2
select issue;
Which reports the following:
{ Name = image, Value1 = C01, Value2 = C011 }
{ Name = name, Value1 = ASP.NET, Value2 = ASP.NET 2.0 }
{ Name = id, Value1 = 20507, Value2 = 20508 }
{ Name = image, Value1 = C05, Value2 = C04 }
{ Name = name, Value1 = PowerShell in Action, Value2 = Architecting Applications }
Note that the last two entries are the "conflict" between the 20508 typo and the otherwise correct 20508 entry.