How to get text/xml as UTF-8 from a multipart/form

2019-02-28 17:28发布

问题:

thanks for your answer, but using an InputStream instead of using getBody(...) does also not work. The code below returns the same result as the one from my original post.

final InputStream inStream = fileUploadInput.getFormDataPart(searchedInput, InputStream.class, null);
// get bytes
final byte[] inBytes = new byte[1024];
final ByteArrayOutputStream outBytes = new ByteArrayOutputStream(inBytes.length);
int length = 0;
while((length = inStream.read(inBytes)) >= 0) {
    outBytes.write(inBytes, 0, length);
}
final byte[] rawInput = outBytes.toByteArray();

// get Encoding
final String asciiInput = new String(rawInput, ASCII);
final String utf8 = new String(rawInput, UTF8);
final String isoLatin1 = new String(rawInput, ISO8859_1);       
log.info("ASCII: " + ascii);
log.info("UTF8: " + utf8);
log.info("ISOLATIN1: " + isoLatin1);
return utf8;

ORIGINAL POST:

I want to upload UTF-8 encoded XML files using the HTML form below and read it on the server using a RESTEasy MultipartFormDataInput, and the Java code shown below. On the server side I seem to be getting the content of the file(s) ASCII encoded, independent of the actual encoding of the uploaded files (which is UTF-8) (accessing it the way described below). All characters not part of the ASCII character set are are being replaced by ?. How can I get 'text/xml' as UTF-8 from a 'multipart/form-data' request with RESTeasy? (I know it is possible to write a PreProcessor - Interceptor and get the raw bytes there, but I can't use this approach in my application).

Upload Form:

<html>
<body>
    <h1>JAX-RS Upload Form</h1>

    <form action="http://.../upload" method="POST" enctype="multipart/form-data">

       <p>Select a file : <input type="file" name="upload"/></p>
       <input type="submit" value="Upload It" />

    </form>
</body>
</html>

Resource class:

@Path("/upload")
@POST
@Consumes("multipart/form-data")
public Response createUploadTemplate(
        @Context HttpServletRequest req,
        MultipartFormDataInput formInput) {

    try {
        final String templateXml = getInput("upload", formInput);
        //...
    } catch (Exception e) {
        //...
    }
}

private static String getInput(final String searchedInput, final MultipartFormDataInput fileUploadInput) throws BadRequestException, IOException {

    try {
        final Map<String, List<InputPart>> inputToInputPart = fileUploadInput.getFormDataMap();

        if(inputToInputPart.containsKey(searchedInput)) {

            final StringBuilder builder = new StringBuilder();
            final List<InputPart> inputParts = inputToInputPart.get(searchedInput);

            for(InputPart inputPart : inputParts) {
                builder.append(inputPart.getBody(String.class,null));
            }

            return builder.toString();
        } else {
                throw new BadRequestException("The form send with the request does not contain an input element " + searchedInput + ".");
        }
    } catch(Exception e) {
        throw new BadRequestException("The file upload failed.", e);
    }
}

MessageBodyReader:

@Provider
@Consumes ("text/xml")
public class XmlStringReader implements MessageBodyReader<String> {
    private static Logger log = LoggerFactory.getLogger(UploadedXmlStringReader.class);

    private static final String ASCII = "ASCII";
    private static final String ISO8859_1 = "ISO8859_1";
    private static final String UTF8 = "UTF8";

    @Override
    public boolean isReadable(final Class<?> type,
                                final Type genericType,
                                final Annotation[] annotations,
                                final MediaType mediaType) {

        boolean result = type.equals(String.class) && MediaType.TEXT_XML_TYPE.equals(mediaType);
        log.info(MessageFormat.format("{0} == String.class && MediaType.TEXT_XML_TYPE == {1}: {2}", type, mediaType, result));
        return result;
    }

    @Override
    public String readFrom(final Class<String> type,
                                        final Type genericType,
                                        final Annotation[] annotations,
                                        final MediaType mediaType,
                                        final MultivaluedMap<String, String> httpHeaders,
                                        final InputStream entityStream) throws IOException, WebApplicationException {

        final byte[] inBytes = new byte[1024];
        final ByteArrayOutputStream outBytes = new ByteArrayOutputStream(inBytes.length);
        int length = 0;

        while((length = entityStream.read(inBytes)) >= 0) {
            outBytes.write(inBytes, 0, length);
        }

        final byte[] rawInput = outBytes.toByteArray();
        final String ascii = new String(rawInput, ASCII);

        final String utf8 = new String(rawInput, UTF8);
        final String isoLatin1 = new String(rawInput, ISO8859_1);       

        log.info("ASCII: " + ascii);
        log.info("UTF8: " + utf8);
        log.info("ISOLATIN1: " + isoLatin1);

        return utf8;
    }
}

回答1:

When no charset is defined in the content-type header of your HTTP request, resteasy assumes 'charset=US-ASCII'. See org.jboss.resteasy.plugins.providers.multipart.InputPart:

/**
    * If there is a content-type header without a charset parameter, charset=US-ASCII
    * is assumed.
    * <p>
    * This can be overwritten by setting a different String value in
    * {@link org.jboss.resteasy.spi.HttpRequest#setAttribute(String, Object)}
    * with this ("resteasy.provider.multipart.inputpart.defaultCharset")
    * String`enter code here` as key. It should be done in a
    * {@link org.jboss.resteasy.spi.interception.PreProcessInterceptor}.
    * </p>
     */

So, as a work-around you can do the following:

 @Provider
@ServerInterceptor
public class CharsetPreProcessInterceptor implements PreProcessInterceptor {

    @Override
    public ServerResponse preProcess(HttpRequest request, ResourceMethod method) throws Failure, WebApplicationException {
        request.setAttribute(InputPart.DEFAULT_CHARSET_PROPERTY, "charset=UTF-8");
        return null;
    }

}


回答2:

I generally would not rely on the getBody method on InputPart. You can actually get each part as a raw input stream and read the data in yourself. Rather than relying on the framework to convert the content to a String.