Obtain InputStream from XML element content

2019-04-15 14:08发布

My servlet's doPost() receives an HttpServletRequest whose ServletInputStream sends me a large chunk of uuencoded data wrapped in XML. E.g., there is an element:

<filedata encoding="base64">largeChunkEncodedHere</filedata>

I need to decode the chunk and write it to a file. I would like to get an InputStream from the chunk, decode it as a stream using MimeUtility, and use that stream to write the file---I would prefer not to read this large chunk into memory.

The XML is flat; i.e., there is not much nesting. My first idea is to use a SAX parser but I don't know how to do the hand-off to a stream to read just the chunk.

Thanks for your ideas.

Glenn

Edit 1: Note JB Nizet's pessimistic answer in this post.

Edit 2: I've answered my own question affirmatively below, and marked maximdim's answer below as correct, even though it doesn't quite answer the question, it did direct me to the StAX API and Woodstox.

3条回答
男人必须洒脱
2楼-- · 2019-04-15 14:31

You could use SAX filter or XPath to get only element(s) you're interested in. Once you have content of your element, pass it to MimeUtility.decode() and write stream to file.

I suggest you update your question with code sample and let us know what doesn't work.

Update:

Here is sample code using StaX2 parser (Woodstox). For some reason StaX parser included in JDK doesn't seems to have comparable getText() method, at least at quick glance.

Obviously input (r) and output (w) could be any Reader/Writer or Stream - using String just for example here.

    Reader r = new StringReader("<foo><filedata encoding=\"base64\">largeChunkEncodedHere</filedata></foo>");
    Writer w = new StringWriter();

    XMLInputFactory2 xmlif = (XMLInputFactory2)XMLInputFactory2.newInstance();
    XMLStreamReader2 sr = (XMLStreamReader2)xmlif.createXMLStreamReader(r);

    boolean flag = false;
    while (sr.hasNext()) {
        sr.next();
        if (sr.getEventType() == XMLStreamConstants.START_ELEMENT) {
            if ("filedata".equals(sr.getLocalName())) {
                flag = true;
            }
        }
        else if (sr.getEventType() == XMLStreamConstants.CHARACTERS) {
            if (flag) {
                sr.getText(w, false);
                break;
            }
        }
    }
    System.out.println(w);
查看更多
女痞
3楼-- · 2019-04-15 14:31

Here are some details on how streaming from an element while parsing with StAX is possible, using the Woodstox framework.

There is a good overview in this article.

From XMLInputFactory we can call createXMLStreamReader(java.io.InputStream stream) using the ServletInputStream. This returns an XMLStreamReader2, which has a getText(Writer w, boolean preserveContents) method that returns an int for the number of bytes written. This method must be implemented. In the implementation Stax2ReaderImpl there is this implementation

// // // StAX2, Pass-through text accessors
public int getText(Writer w, boolean preserveContents)
    throws IOException, XMLStreamException
{
    char[] cbuf = getTextCharacters();
    int start = getTextStart();
    int len = getTextLength();

    if (len > 0) {
        w.write(cbuf, start, len);
    }
    return len;
}

In this code we will need to change the getTextCharacters() method so that it reads from the InputStream. In the Woodstox tests TestGetSegmentedText testSegmentedGetCharacters() method we see a sr.getTextCharacters(offset, buf, start, len) method used. In fact the javadoc for the multiple argument XMLStreamReader.getTextCharacters() shows the following implementation.

int length = 1024;
char[] myBuffer = new char[ length ];
for ( int sourceStart = 0 ; ; sourceStart += length ) {
    int nCopied = stream.getTextCharacters( sourceStart, myBuffer, 0, length );
    if (nCopied < length) {
        break;
    }
}
查看更多
太酷不给撩
4楼-- · 2019-04-15 14:54

One more suggestion wrt Woodstox: it can also decode that base64 encoded stuff from within, efficiently. To do that, you need to cast XMLStreamReader into XMLStreamReader2 (or TypedXMLStreamReader), which is part of Stax2 extension API.

But with that, you get methods readElementAsBinary() and getElementAsBinary() which automatically handle Base64 decoding. XMLStreamWriter2 similarly has Base64-encoding methods for writing binary data.

查看更多
登录 后发表回答