Send Unicode characters via MultipartEntity

2019-08-03 15:18发布

问题:

I have a method to send image and text as a HttpPost using MultipartEntity content type. Everything works great with English symbols, but for unicode symbols (for example Cyrliics) it sends only ???. So, I'm wondering, how to set UTF-8 encoding for MultipartEntity correctly, since I've tried several sulutions, suggested on SO, but none of them worked. Here what I have already:

HttpClient httpclient = new DefaultHttpClient();
HttpPost httpPost = new HttpPost(url);

MultipartEntityBuilder mpEntity = MultipartEntityBuilder.create();
mpEntity.setMode(HttpMultipartMode.BROWSER_COMPATIBLE);
mpEntity.setCharset(Consts.UTF_8);

mpEntity.addPart("image", new FileBody(new File(attachmentUri), ContentType.APPLICATION_OCTET_STREAM));


ContentType contentType = ContentType.create(HTTP.PLAIN_TEXT_TYPE, HTTP.UTF_8);
StringBody stringBody = new StringBody(mMessage, contentType);
mpEntity.addPart("message", stringBody);

final HttpEntity fileBody = mpEntity.build();
httpPost.setEntity(fileBody);  

HttpResponse httpResponse = httpclient.execute(httpPost);

UPD I tried to use InputStream as per @Donaudampfschifffreizeitfahrt suggestion. Now I'm getting ��� characters.

 InputStream stream = new ByteArrayInputStream(mMessage.getBytes(Charset.forName("UTF-8")));
 mpEntity.addBinaryBody("message", stream);

Also tried:

mpEntity.addBinaryBody("message", mMessage.getBytes(Charset.forName("UTF-8")));

回答1:

I solved it a different way, using:

builder.addTextBody(key, שלום, ContentType.TEXT_PLAIN.withCharset("UTF-8"));


回答2:

You can use below line for add part in multipart entity

entity.addPart("Data", new StringBody(data,Charset.forName("UTF-8")));

to send unicode in request.



回答3:

To the ones who stuck with this issue, this is how I resolved it:

I investigated apache http components libraries source code and found following:

org.apache.http.entity.mime.HttpMultipart::doWriteTo()


case BROWSER_COMPATIBLE:
    // Only write Content-Disposition
    // Use content charset

    final MinimalField cd = part.getHeader().getField(MIME.CONTENT_DISPOSITION);
    writeField(cd, this.charset, out);
    final String filename = part.getBody().getFilename();
    if (filename != null) {
        final MinimalField ct = part.getHeader().getField(MIME.CONTENT_TYPE);
        writeField(ct, this.charset, out);
    }
    break;

So, seems like it is some kind of bug / feature in apache lib, which only allowes to add Content-type header to one part of MultipartEntity, if this part has filename not null. So I modified my code as:

Charset utf8 = Charset.forName("utf-8");
ContentType contentType = ContentType.create(ContentType.TEXT_PLAIN.getMimeType(), utf8);
ContentBody body = new ByteArrayBody(mMessage.getBytes(), contentType, "filename");
mpEntity.addPart("message", body);

And Content-type header appeared for string part, and symbols are now encoded and decoded correctly.