Get all leaf from complex XML with attributes

2019-09-09 16:31发布

I have that kind of XML file


<?xml version="1.0" encoding="utf-8"?>
<products nb="2" type="new">
    <product ean="12345677654321">
           <short_desc> Short description of the product1 </short_desc>
           <price currency="USD">19.65</price>
    <product ean="12345644654321">
           <long_desc> Long description of the product2 </long_desc>
           <price currency="USD">19.65</price>

I would an array like this


I almost this result with this code


<xsl:stylesheet version="1.0"  xmlns:xsl="">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="vApos">'</xsl:variable>

    <xsl:template match="*[@* or not(*)] ">
      <xsl:if test="not(*)">
         <xsl:apply-templates select="ancestor-or-self::*" mode="path"/>
        <xsl:apply-templates select="@*|*"/>

    <xsl:template match="*" mode="path">
        <xsl:value-of select="concat('/',name())"/>
        <xsl:variable name="vnumPrecSiblings" select=
        <xsl:if test="$vnumPrecSiblings">
            <xsl:value-of select="concat('[', $vnumPrecSiblings +1, ']')"/>

    <xsl:template match="@*">
        <xsl:apply-templates select="../ancestor-or-self::*" mode="path"/>
        <xsl:value-of select="concat('/@',name())"/>

$xslDoc = new \DOMDocument();
$xslDoc->substituteEntities = true;

$xmlDoc = new \DOMDocument();

$proc = new \XSLTProcessor();
$rest = $proc->transformToXML($xmlDoc);

$res = preg_replace("/\\s/"," ", $rest);

$path = explode(" ", $res);

foreach ($path as $key => $value) {
    if(!empty($value) && !preg_match("/\[.*\]/", $value))
        $fields[] = $value;

return $fields;

This code give me


/products/product/parameters/long_desc and /products/product/parameters/price/vat are missing :(

How can I parse the full XML with xslt ? Or have you a solution without XSLT ???

2楼-- · 2019-09-09 16:55

Yeah you can do it with some Xpath in PHP.

$dom = new DOMDocument();
$xpath = new DOMXpath($dom);

function getNodeExpression(DOMNode $node, array &$namespaces) {
  $name = $node->localName;
  $namespace = $node->namespaceURI;
  if ($namespace == '') {
    return ($node instanceOf DOMAttr ? '@' : '').$name;
  } elseif (isset($namespaces[$namespace])) {
    $prefix = $namespaces[$namespace];
  } else {
    $xmlns = $prefix = ($node->prefix == '') ? 'ns' : $node->prefix;
    $i = 1;
    while (in_array($xmlns, $namespaces)) {
      $xmlns = $prefix.'-'.$i;
    $namespaces[$namespace] = $prefix;
  return ($node instanceOf DOMAttr ? '@' : '').$prefix.':'.$name;

$result = [];
$namespaces= [];
foreach ($xpath->evaluate('//*[count(*) = 0]|//@*') as $node) {
  $path = '';
  foreach ($xpath->evaluate('ancestor::*', $node) as $parent) {
    $path = '/'.getNodeExpression($parent, $namespaces);
  $path .= '/'.getNodeExpression($node, $namespaces);
  $result[$path] = TRUE;


array(10) {
  string(13) "/products/@nb"
  string(15) "/products/@type"
  string(13) "/product/@ean"
  string(12) "/product/sku"
  string(22) "/parameters/short_desc"
  string(17) "/parameters/price"
  string(16) "/price/@currency"
  string(21) "/parameters/long_desc"
  string(20) "/long_desc/@xml:lang"
  string(15) "/parameters/vat"
array(1) {
  string(3) "xml"

The complex part at this is to resolve the namespaces and generate prefixes for them. So let's take a detailed look:

Get the local name (tag name without namespace prefix) and the namespace.

$name = $node->localName;
$namespace = $node->namespaceURI;

If the namespace is empty we do not need any prefix return an expression with just the node name.

  if ($namespace == '') {
    return ($node instanceOf DOMAttr ? '@' : '').$name;

Otherwise check if the namespace was already used on another node and reuse that prefix.

  } elseif (isset($namespaces[$namespace])) {
    $prefix = $namespaces[$namespace];

If here is an unknown namespace, read the prefix used on this node. If the node didn't use a prefix use the string "ns".

  } else {
    $xmlns = $prefix = ($node->prefix == '') ? 'ns' : $node->prefix;

Validate that the prefix is not already used for another namespace add a number and increase it until we have an unique prefix.

    $i = 1;
    while (in_array($xmlns, $namespaces)) {
      $xmlns = $prefix.'-'.$i;

Store the namespace => prefix definition for the next call.

    $namespaces[$namespace] = $prefix;

Return an expression including the prefix.

  return ($node instanceOf DOMAttr ? '@' : '').$prefix.':'.$name;

The namespace array can be used to register all needed namespace prefix on an Xpath object.

登录 后发表回答