How to extract XML data using XPath with Selenium

2020-06-28 03:00发布

问题:

I'm using Selenium Webdriver (ver 2.31.2.0) (.Net) and I'm trying to extract an element (XML) which is returning from the `driver.PageSource'.

My Question: How to get the list of items using the below xpath. I able to play in FF using XPATH addons but the same code does not work in Selenium Webdriver

any help?

Here is my code in Selenium Webdriver:

var driver = new FirefoxDriver();
driver.Navigate().GoToUrl("http://website_name/languages.xml");
string _page_source = driver.PageSource;
ReadOnlyCollection<IWebElement> webElements = _page_source.FindElementsByXPath("//response//results//items/vList");

my xml looks like this:

<response xmlns="http://schemas.datacontract.org/2004/07/myproj.cnn.com">
xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
    <meta>

    </meta>
    <results i:type="vList">
        <name>Language</name>
        <queryValue>language</queryValue>
        <displayOrder>0</displayOrder>
        <items>
            <vList>
                <name>English</name>
                <displayName>English</displayName>
                <displayOrder>0</displayOrder>
                <items />
            </vList>
            <vList>
                <name>Swedish</name>
                <displayName>Swedish</displayName>
                <displayOrder>1</displayOrder>
                <items />
            </vList>
        </items>
    </results>
</response>

回答1:

You can use selenium to browse to and obtain the xml, but work with the xml using .net classes.

The driver.PageSource property is a string, and you should use .Net classes directly to parse the xml represented. Also, there is no method FindElementsByXPath() on a string object, unless this is an extension method that you have written.

Read the xml using the driver.PageSource from selenium

var driver = new FirefoxDriver();
driver.Navigate().GoToUrl("http://website_name/languages.xml");
XmlReader reader = XmlReader.Create(driver.PageSource);

Or, read the xml by directly browsing to the url using

XmlReader reader = XmlReader.Create("http://website_name/languages.xml");

And then use below code to parse and read the xml. Key point to note is how the namespace information is provided to the xpath.

//load xml document
XElement xmlDocumentRoot = XElement.Load(reader);
//also add the namespace infn, chose a prefix for the default namespace
XmlNameTable nameTable = reader.NameTable;
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(nameTable);
namespaceManager.AddNamespace("a", "http://schemas.datacontract.org/2004/07/myproj.cnn.com");

//now query with your xml - remeber to prefix the default namespace
var items = xmlDocumentRoot.XPathSelectElements("//a:results/a:items/a:vList", namespaceManager);

Console.WriteLine("vlist has {0} items.", items.Count());

foreach (var item in items)
{
Console.WriteLine("Display name: {0}", item.XPathSelectElement("a:displayName",namespaceManager).Value);
}
// OR get a list of all display names using linq
var displayNames = items.Select(x => x.XPathSelectElement("a:displayName", namespaceManager).Value).ToList();

You will need the following namespaces for the above to work:

using System;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;


回答2:

The XML input you have posted has a namespace declared: xmlns="http://schemas.datacontract.org/2004/07/myproj.cnn.com". See the next line:

<response xmlns="http://schemas.datacontract.org/2004/07/myproj.cnn.com">

Because this namespace has no prefix it is the default namespace for all elements without a prefix. Which means element <response> and element <results> etc all belong to this namespace.

Read next: http://www.w3schools.com/xml/xml_namespaces.asp

So in you code you need to declare the namespace before any XPath evaluation will work. I do not know how to set the namespace in Selenium Webdriver but you can find it I guess.

Once you declared the namespace you need to use this in your XPath. For example in a XSLT you can declare the namespace as follows:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:foo="http://schemas.datacontract.org/2004/07/myproj.cnn.com">

I now declared the namespace with prefix foo. The XPath that can be used to retrieve all vList elements would be:

/foo:response/foo:results/foo:items/foo:vList

To get all displayName elements you can use:

/foo:response/foo:results/foo:items/foo:vList/foo:displayName

If you want the total count of the elements instead of the list of elements, you can wrap count() around it like:

count(/foo:response/foo:results/foo:items/foo:vList)
count(/foo:response/foo:results/foo:items/foo:vList/foo:displayName)

The XPath you used has a lot of // in it. Only use // if it is really necessary, because it will scan the complete file and takes more resources than necessary if you know the path already.