A simple way to remove headers from XML files

2019-07-05 00:21发布

问题:

I need remove non-xml tags from file generated by another program.

The file is some like this:

Executing Command - Blah.exe ...
-----Command Output-----
HTTP/1.1 200 OK
Connection: close
Content-Type: text/xml

<?xml version="1.0"?>
<testResults>
  <finalCounts>
    <right>7</right>
    <wrong>4</wrong>
    <ignores>0</ignores>
    <exceptions>0</exceptions>
  </finalCounts>
</testResults>

Exit-Code: 15

How to remove the non-xml text easily in java?

回答1:

// getContent() returns the complete text to strip.
//
String s = getContent();

// Find the start of the XML content using the <?xml prefix.
//
int xmlIndex = s.indexOf( "<?xml" );

// Strip the non-XML header.
//
s = s.substring( xmlIndex );

// Find the last closing angle-bracket; should indicate end of the XML.
//
xmlIndex = s.lastIndexOf( ">" );

// Strip everything after the closing angle-bracket.
//
s = s.substring( 0, xmlIndex );


回答2:

This looks like direct HTTP output... so just scanning for the first two consecutive line feeds (probably with carriage returns in front of them) will give you the end of the prefix you want to filter out.