DataInputStream and UTF-8

2020-02-14 06:07发布

问题:

I'm kind of a new programmer, and I'm having a couple of problems with the code I'm handling.

Basically what the code does is receive a form from another JSP, read the bytes, parse the data, and submit the results to SalesForce, using DataInputStream.

   //Getting the parameters from request
 String contentType = request.getContentType();
 DataInputStream in = new DataInputStream(request.getInputStream());
 int formDataLength = request.getContentLength();

 //System.out.println(formDataLength);
 byte dataBytes[] = new byte[formDataLength];
 int byteRead = 0;
 int totalBytesRead = 0;
 while (totalBytesRead < formDataLength) 
 {
  byteRead = in.read(dataBytes, totalBytesRead, formDataLength);
  totalBytesRead += byteRead;
 }

It works fine, but only if the code handles normal characters. Whenever it tries to handle special characters (like french chars: àâäæçéèêëîïôùûü) I get the following gibberish as a result:

à âäæçéèêëîïôùûü

I understand it could be an issue of DataInputStream, and how it doesn't return UTF-8 encoded text. Do you guys offer any suggestions on how to tackle this issue?

All the .jsp files include <%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%> and Tomcat's settings are fine (URI = UTF-8, etc). I tried adding:

request.setCharacterEncoding("UTF-8");

and

response.setCharacterEncoding("UTF-8");

to no avail.

Here's an example of how it parses the data:

    //Getting the notes for the Case 
 String notes = new String(dataBytes);
 System.out.println(notes);
 String savenotes = casetype.substring(notes.indexOf("notes"));
 //savenotes = savenotes.substring(savenotes.indexOf("\n"), savenotes.indexOf("---"));
 savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
 savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
 savenotes = savenotes.substring(0,savenotes.indexOf("name=\"datafile"));
 savenotes = savenotes.substring(0,savenotes.lastIndexOf("\n------"));
 savenotes = savenotes.trim();

Thanks in advance.

回答1:

The problem is not in the inputstreams since they doesn't handle characters, but only bytes. Your problem is at the point you convert those bytes to characters. In this particular case, you need to specify the proper encoding in the String constructor.

String notes = new String(dataBytes, "UTF-8");

See also:

  • Unicode - How to get characters right?

By the way, the DataInputStream has no additional value in the particular code snippet. You can just keep it InputStream.