How to do a PHP simplexml xpath search for text value in a tab delimited ELEMENT and returning text from that same element at a different offset from where the search text offset?
Lets say I wish to find the DATA element containing a Value of '2' and return the LongValue 'Academy'.
The xml document is in the following format
<METADATA Resource="Property" Lookup="Area">
<COLUMNS>->fieldname *(->fieldname)-></COLUMNS>
*(<DATA>->fielddata *(->fielddata)-></DATA>)
</METADATA>
Note: ignore spaces
*() means 1 or more
-> is tab chr(9)
In the example below the COLUMNS element contains three column names (LongValue, ShortValue, Value), which can be in any order.
Each DATA element has 3 corresponding tab delimited text values, for example the first DATA element below contains
LongVlaue = 'Salado'
ShortValue = 'Sal'
Value = '5'
Here is the XML document
<METADATA Resource="Property" Lookup="Area">
<COLUMNS> LongValue ShortValue Value </COLUMNS>
<DATA> Salado Sal 5 </DATA>
<DATA> Academy Aca 2 </DATA>
<DATA> Rogers Rog 1 </DATA>
<DATA> Bartlett Bar 4 </DATA>
</METADATA>
Note: the COLUMNS and DATA elements has text tab delimited for 3 columns where each column starts with a tab followed by text, then one last tab at the end
Here's what I think:
1.) Preferably find the offset for the column named 'Value' from the COLUMNS element before trying to find the corresponding text from the DATA element because the ‘Value’ column can be in any order, however the text in the DATA elements will be in that order.
2.) Search for a DATA element containing text in the 'Value' column and return the text from the 'LongValue'.
Here's a example of an xpath search that some what works but is flawed because it does not take in account the offset for the Value column in the COLUMNS element so it can properly find the corresponding (correct) position of the ‘Value’ column in the DATA element.
Here's a code snip-it:
$xml_text = ‘the xml document above’;
$xml = simplexml_load_string($xml_text); //load the xml document
$resource = 'Property'; //value for the Resource attribute METADATA.
$lookup = 'Area'; //value for the Lookup attribute in METADATA
$value = '2'; //the needle we are looking for
$find = "\t" . $value . "\t";
/*
adding tabs before and after the $value may be flawed, although each
column starts with a tab followed by text, only the last column has
the an extra tab. Not sure this would work properly if the column
was in the middle, or if the ELEMENT happened to have multiple $value
in the same element. */
/*
Search for a specific METADATA element with matching
Resource and Lookup attributes */
$node = $this->xml->xpath(
"//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
."/DATA[contains(., '{$find}')]"
);
$x = explode("\t", (string) trim($node[0])); //convert the tab delimited
//string to an array
echo print_r($x,true); //this shows what the array would look like,
//with out the trim there would be empty
//first and last array elements
Array
(
[0] => Academy
[1] => Aca
[2] => 2
)
$LongValue = $x[0]; //assuming the LongValue is in the first column
echo $LongValue; //this shows the LongValue retuned
Academy
Thanks for any help!
Update... After posting, came up with this…
//get index of 'Values' column from COLUMNS element
$node = $this->xml->xpath(
"//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
."/COLUMNS");
if($node) {
//array of column names
$columns = explode("\t", strtolower((string) trim($node[0])));
$long_value_index = array_search('longvalue', $columns);
} else {
echo 'not found';
exit;
}
Now with the $index this could return the LongValue from the proper offset
$LongValue = $x[$long_value_index];
Any thoughts
You are already quite far and you have well analyzed the data you need to deal with. Also how you say you want to parse the data looks very well for me. The only thing that probably can be a little improved is that you take care to not do too much at once.
One way to do so is to divide the problem(s) into smaller ones. I will show you how that works putting code into multiple functions and methods. But lets start with a single function, this goes step-by-step, so you can try to follow the examples to build this up.
One way to separate problems in PHP is to use functions. For example, write one function to search in the XML document, this makes the code look a better and more speaking:
/**
* search metadata element
*
*
* @param SimpleXMLElement $xml
* @param string $resource metadata attribute
* @param string $lookup metadata attribute
* @param string $value search value
*
* @return SimpleXMLElement
*/
function metadata_search(SimpleXMLElement $xml, $resource, $lookup, $value) {
$xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
."/DATA[contains(., '{$find}')]";
list($element)= $xml->xpath($xpath);
return $element;
}
So now you can easily search the document, the parameters are named and documented. All that it is needed is to call the function and get the return value:
$data = metadata_search($xml, 'Property', 'Area', 2);
This might not be the perfect function, but it is an example already. Next to functions you can also create objects. Objects are functions that have their own context. That's why those functions are called methods then, they belong to the object. Like the xpath()
method of the SimpleXMLElement.
If you see the function above, the first parameter is the $xml
object. On that the xpath method is then executed. In the end what this function really does is creating and running the xpath query based on the input variables.
If we could bring that function directly into the $xml
object, we would not need to pass that any longer as first parameter. That is the next step and it works by extending SimpleXMLElement
. We just add one new method that does the search and the method is pretty much the same as above. We also extend from SimpleXMLElement
which means we create a sub-type of it: That is all it has already plus that new method you add:
class MetadataElement extends SimpleXMLElement
{
/**
* @param string $resource metadata attribute
* @param string $lookup metadata attribute
* @param string $value search value
*
* @return SimpleXMLElement
*/
public function search($resource, $lookup, $value) {
$xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
."/DATA[contains(., '{$value}')]";
list($element)= $this->xpath($xpath);
return $element;
}
}
To get this to life, we need to provide the name of this class when loading the XML string. Then the search method can be called directly:
$xml = simplexml_load_string($xmlString, 'MetadataElement');
$data = $xml->search('Property', 'Area', 2);
Voila, the search is now with the SimpleXMLElement!
But what to do with this $data
? It's just an XML element and it still contains the tabs.
Even more bad, the context is lost: To which metadata column does this belong to? That is your problem. So we need to solve this next - but how?
Honestly, there are many ways to do that. One Idea I had was to create a table object out of the XML based on a metadata element:
list($metadata) = $xml->xpath('//METADATA[1]');
$csv = new CsvTable($metadata);
echo $csv;
Even with nice debug output:
+---------+----------+-----+
|LongValue|ShortValue|Value|
+---------+----------+-----+
|Salado |Sal |5 |
+---------+----------+-----+
|Academ |Aca |2 |
+---------+----------+-----+
|Rogers |Rog |1 |
+---------+----------+-----+
|Bartlett |Bar |4 |
+---------+----------+-----+
But that is somehow a lot of work if you're probably not fluent with programming objects so building a whole table model on it's own is maybe a bit much.
Therefore I had the idea: Why not continue to use the XML object you already use and change the XML in there a bit to have it in a better format for your purposes. From:
<METADATA Resource="Property" Lookup="Area">
<COLUMNS> LongValue ShortValue Value </COLUMNS>
<DATA> Salado Sal 5 </DATA>
To:
<METADATA Resource="Property" Lookup="Area" transformed="1">
<COLUMNS> LongValue ShortValue Value </COLUMNS>
<DATA>
<LongValue>Salado</LongValue><ShortValue>Sal</ShortValue><Value>5</Value>
</DATA>
This would allow to not only search per a specific column name but also to find the other values in the data element. If the search return the $data
element:
$xml = simplexml_load_string($xmlString, 'MetadataElement');
$data = $xml->search('Property', 'Area', 5);
echo $data->Value; # 5
echo $data->LongValue; # Salado
If we leave an additional attribute with the metadata-element we can convert these elements while we search. If some data is found and the element not yet converted, it will be converted.
Because we all do this inside the search method, the code using the search method must not change much (if not even not at all - depends a bit on the detailed needs you have, I might not have fully grasped those, but I think you get the idea). So let's put this to work. Because we don't want to do this all at once, we create multiple new methods to:
- transform a metadata element
- search inside the original element (this code we have already, we just move it)
Along the way we will also create methods we deem helpful, you will notice that this is also partly code that you have written already (like in search()), it is just placed now inside the $xml
object - where it more naturally belongs.
Then finally these new methods will be put together in the existing search()
method.
So first of all, we create a helper method to parse this tabbed line into an array. It's basically your code, you do not need the string cast in front of trim
, that is the only difference. Because this function is only needed inside, we make it private:
private function asExplodedString() {
return explode("\t", trim($this));
}
By its name it is clear what it does. It gives back the tab-exploded array of itself. If you remember, we are inside $xml
so now every xml-element has this method. If you do not full understand this yet, just go on, you can see how it works right below, we only add one more method as a helper:
public function getParent() {
list($parent) = $this->xpath('..') + array(0 => NULL);
return $parent;
}
This function allows us to retrieve the parent element of an element. This is useful because if we find a data element we want to transform the metadata element which is the parent. And because this function is of general use, I have chosen to make it public. So it can be used also in outside code. It solves a common problem and therefore is not of that specific nature like the explode method.
So now we want to transform a metadata element. It will take some more lines of code as these two helper methods above though, but thanks to those things will not be complicated.
We just assume that the element this method is called on is the metadata element. We do not add checks here to keep the code small. As this is a private function again, we even do not need to check: If this method is invoked on the wrong element, the fault had been done inside the class itself - not from outside code. This is also a nice example why I use private methods here, it's much more specific.
So what we do now with the metadata element is actually quite simple: We fetch the column element inside, explode the column names, and then we go over each data-element, explode the data as well, then empty the data-element only to add the column-named children to it. Finally we add an attribute to mark the element as transformed:
private function transform() {
$columns = $this->COLUMNS->asExplodedString();
foreach ($this->DATA as $data) {
$values = $data->asExplodedString();
$data[0] = ''; # set the string of the element (make <DATA></DATA> empty)
foreach ($columns as $index => $name) {
$data->addChild($name, $values[$index]);
}
}
$this['transformed'] = 1;
}
Okay. Now what gives? Let's test this. To do that we modify the existing search function to return the transformed data element - by adding a single line of code:
public function search($resource, $lookup, $value) {
$xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
. "/DATA[contains(., '{$value}')]";
list($element) = $this->xpath($xpath);
$element->getParent()->transform();
###################################
return $element;
}
And then we output it as XML:
$data = $xml->search('Property', 'Area', 2);
echo $data->asXML();
This now gives the following output (beautified, it's on a single line normally):
<DATA>
<LongValue>Academ</LongValue>
<ShortValue>Aca</ShortValue>
<Value>2</Value>
</DATA>
And let's also check that the new attribute is set and all other data-elements of that metadata-table/block are transformed as well:
echo $data->getParent()->asXML();
And the output (beautified) as well:
<METADATA Resource="Property" Lookup="Area" transformed="1">
<COLUMNS> LongValue ShortValue Value </COLUMNS>
<DATA>
<LongValue>Salado</LongValue>
<ShortValue>Sal</ShortValue>
<Value>5</Value>
</DATA>
...
This shows that the code works as intended. This might already solve your issue. E.g. if you always search for a number and the other columns do not contain numbers and you only need to search one per metadata block. However likely not, therefore the search function needs to be changed to perform the correct search and transform internally.
This time again we make use of the $this
to put a method on the concrete XML element. Two new methhods: One to get a Metadata element based on it's attributes:
private function getMetadata($resource, $lookup) {
$xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']";
list($metadata) = $this->xpath($xpath);
return $metadata;
}
And one to search a specific column of a metadata element:
private function searchColumn($column, $value) {
return $this->xpath("DATA[{$column}[contains(., '{$value}')]]");
}
These two methods are then used in the main search method. It will be slightly changed by first looking up the metadata element by its attributes. Then it will be checked if the transformation is needed and then the search by the value column is done:
public function search($resource, $lookup, $value)
{
$metadata = $this->getMetadata($resource, $lookup);
if (!$metadata['transformed']) {
$metadata->transform();
}
list($element) = $metadata->searchColumn('Value', $value);
return $element;
}
And now the new way of searching is finally done. It now searches only in the right column and the transformation will be done on the fly:
$xml = simplexml_load_string($xmlString, 'MetadataElement');
$data = $xml->search('Property', 'Area', 2);
echo $data->LongValue, "\n"; # Academ
Now that looks nice and it looks as if it is totally easy to use! All the complexity went into MetadataElement. And how does it look like at a glance?
/**
* MetadataElement - Example for extending SimpleXMLElement
*
* @link http://stackoverflow.com/q/16281205/367456
*/
class MetadataElement extends SimpleXMLElement
{
/**
* @param string $resource metadata attribute
* @param string $lookup metadata attribute
* @param string $value search value
*
* @return SimpleXMLElement
*/
public function search($resource, $lookup, $value)
{
$metadata = $this->getMetadata($resource, $lookup);
if (!$metadata['transformed']) {
$metadata->transform();
}
list($element) = $metadata->searchColumn('Value', $value);
return $element;
}
private function getMetadata($resource, $lookup) {
$xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']";
list($metadata) = $this->xpath($xpath);
return $metadata;
}
private function searchColumn($column, $value) {
return $this->xpath("DATA[{$column}[contains(., '{$value}')]]");
}
private function asExplodedString() {
return explode("\t", trim($this));
}
public function getParent() {
list($parent) = $this->xpath('..') + array(0 => NULL);
return $parent;
}
private function transform() {
$columns = $this->COLUMNS->asExplodedString();
foreach ($this->DATA as $data) {
$values = $data->asExplodedString();
$data[0] = ''; # set the string of the element (make <DATA></DATA> empty)
foreach ($columns as $index => $name) {
$data->addChild($name, $values[$index]);
}
}
$this['transformed'] = 1;
}
}
Not too bad either. Many small methods that just have some little lines of code, that is (rel.) easy to follow!
So I hope this gives some inspiration, I know this was a quite some text to read. Have fun!