I have looked around at the libxml2 code samples and I am confused on how to piece them all together.
What are the steps needed when using libxml2 to just parse or extract data from an XML file?
I would like to get hold of, and possibly store information for, certain attributes. How is this done?
I found these two resources helpful when I was learning to use libxml2 to build a rss feed parser.
Tutorial with SAX interface
Tutorial using the DOM Tree (code example for getting an attribute value included)
I believe you first need to create a Parse tree. Maybe this article can help, look through the section which says How to Parse a Tree with Libxml2.
libxml2 provides various examples showing basic usage.
http://xmlsoft.org/examples/index.html
For your stated goals, tree1.c would probably be most relevant.
http://xmlsoft.org/examples/tree1.c
Once you have an xmlNode struct for an element, the "properties" member is a linked list of attributes. Each xmlAttr object has a "name" and "children" object (which are the name/value for that attribute, respectively), and a "next" member which points to the next attribute (or null for the last one).
http://xmlsoft.org/html/libxml-tree.html#xmlNode
http://xmlsoft.org/html/libxml-tree.html#xmlAttr
Here, I mentioned complete process to extract XML/HTML data from file on windows platform.
Also download its dependency iconv.dll and zlib1.dll from the same page
Extract all .zip files into the same directory. For Ex: D:\demo\
Copy iconv.dll, zlib1.dll and libxml2.dll into c:\windows\system32 deirectory
Make libxml_test.cpp file and copy following code into that file.
Open Visual Studio Command Promt
Go To D:\demo directory
execute cl libxml_test.cpp /I".\libxml2-2.7.8.win32\include" /I".\iconv-1.9.2.win32\include" /link libxml2-2.7.8.win32\lib\libxml2.lib command
Run binary using libxml_test.exe test.html command(Here test.html may be any valid HTML file)