I have the following XML file, the file is rather large and i haven't been able to get simplexml to open and read the file so i'm trying XMLReader with no success in php
<?xml version="1.0" encoding="ISO-8859-1"?>
<products>
<last_updated>2009-11-30 13:52:40</last_updated>
<product>
<element_1>foo</element_1>
<element_2>foo</element_2>
<element_3>foo</element_3>
<element_4>foo</element_4>
</product>
<product>
<element_1>bar</element_1>
<element_2>bar</element_2>
<element_3>bar</element_3>
<element_4>bar</element_4>
</product>
</products>
I've unfortunately not found a good tutorial on this for PHP and would love to see how I can get each element content to store in a database.
It all depends on how big the unit of work, but I guess you're trying to treat each
<product/>
nodes in succession.For that, the simplest way would be to use XMLReader to get to each node, then use SimpleXML to access them. This way, you keep the memory usage low because you're treating one node at a time and you still leverage SimpleXML's ease of use. For instance:
Quick overview of pros and cons of different approaches:
XMLReader only
Pros: fast, uses little memory
Cons: excessively hard to write and debug, requires lots of userland code to do anything useful. Userland code is slow and prone to error. Plus, it leaves you with more lines of code to maintain
XMLReader + SimpleXML
Pros: doesn't use much memory (only the memory needed to process one node) and SimpleXML is, as the name implies, really easy to use.
Cons: creating a SimpleXMLElement object for each node is not very fast. You really have to benchmark it to understand whether it's a problem for you. Even a modest machine would be able to process a thousand nodes per second, though.
XMLReader + DOM
Pros: uses about as much memory as SimpleXML, and XMLReader::expand() is faster than creating a new SimpleXMLElement. I wish it was possible to use
simplexml_import_dom()
but it doesn't seem to work in that caseCons: DOM is annoying to work with. It's halfway between XMLReader and SimpleXML. Not as complicated and awkward as XMLReader, but light years away from working with SimpleXML.
My advice: write a prototype with SimpleXML, see if it works for you. If performance is paramount, try DOM. Stay as far away from XMLReader as possible. Remember that the more code you write, the higher the possibility of you introducing bugs or introducing performance regressions.
Most of my XML parsing life is spent extracting nuggets of useful information out of truckloads of XML (Amazon MWS). As such, my answer assumes you want only specific information and you know where it is located.
I find the easiest way to use XMLReader is to know which tags I want the information out of and use them. If you know the structure of the XML and it has lots of unique tags, I find that using the first case is the easy. Cases 2 and 3 are just to show you how it can be done for more complex tags. This is extremely fast; I have a discussion of speed over on What is the fastest XML parser in PHP?
The most important thing to remember when doing tag-based parsing like this is to use
if ($myXML->nodeType == XMLReader::ELEMENT) {...
- which checks to be sure we're only dealing with opening nodes and not whitespace or closing nodes or whatever.For xml formatted with attributes...
data.xml:
php code:
XMLReader is well documented on PHP site. This is a XML Pull Parser, which means it's used to iterate through nodes (or DOM Nodes) of given XML document. For example, you could go through the entire document you gave like this:
It is then up to you to decide how to deal with the node returned by XMLReader::expand().
This topic is quit long ago, but i just found it. Thanks God.
My problem is i have to read ONIX file (book data), and store it to our database. I use simplexml_load before, and although it used a lot of memory but still ok for relatively small file (up to 300MB). Beyond that size is a disaster for me.
After reading, especially Francis Lewis interpretation, i use combination of xmlreader and simplexml. The result is exceptional, memory usage is small and insert it to database fast enough for me.
Here is my code: