I'm trying to parse CDATA tpyes in XML. The code runs fine and it will print Links: in the console (about 50 times, because that's how many links I have) but the links won't appear...it's just a blank console space. What could I be missing?``
package Parse;
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.CharacterData;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class XMLParse {
public static void main(String[] args) throws Exception {
File file = new File("c:test/returnfeed.xml");
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(file);
NodeList nodes = doc.getElementsByTagName("video");
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName("videoURL");
Element line = (Element) title.item(0);
System.out.println("Links: " + getCharacterDataFromElement(line));
}
}
public static String getCharacterDataFromElement(Element e) {
Node child = e.getFirstChild();
if (child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
return cd.getData();
}
return "";
}
}
Result:
Links:
Links:
Links:
Links:
Links:
Links:
Links:
Sample XML: (Not full document)
<?xml version="1.0" ?>
<response xmlns:uma="http://websiteremoved.com/" version="1.0">
<timestamp>
<![CDATA[ July 18, 2012 5:52:33 PM PDT
]]>
</timestamp>
<resultsOffset>
<![CDATA[ 0
]]>
</resultsOffset>
<status>
<![CDATA[ success
]]>
</status>
<resultsLimit>
<![CDATA[ 207
]]>
</resultsLimit>
<resultsCount>
<![CDATA[ 207
]]>
</resultsCount>
<videoCollection>
<name>
<![CDATA[ Video API
]]>
</name>
<count>
<![CDATA[ 207
]]>
</count>
<description>
<![CDATA[
]]>
</description>
<videos>
<video>
<id>
<![CDATA[ 8177840
]]>
</id>
<headline>
<![CDATA[ Test1
]]>
</headline>
<shortHeadline>
<![CDATA[ Test2
]]>
</shortHeadline>
<description>
<![CDATA[ Test3
]]>
</description>
<shortDescription>
<![CDATA[ Test4
]]>
</shortDescription>
<posterImage>
<![CDATA[ http://a.com.com/media/motion/2012/0718/los_120718_los_bucher_on_howard.jpg
]]>
</posterImage>
<videoURL>
<![CDATA[ http://com/removed/2012/0718/los_120718_los_bucher_on_howard.mp4
]]>
</videoURL>
</video>
</videos>
</videoCollection>
</response>
I would consider using getTextContent()
Instead of checking the first child, it would be prudent whether the node has other children as well. In your case (and I guess if you had debugged that node, you would've known), the node passed to the method
getCharacterDataFromElement
had multiple children. I updated the code and this one might give you the pointers to the right direction: