Python UTF-8 text arrives deformed to Android

2019-08-05 08:16发布

问题:

I want to send UTF-8 text that is stored with ElasticSeach to an application via sockets.

I have a ThreadedTCPServer Implemented, here is the class that should handle the reply.

I have implemented basic string based handshaking to share some info like query was sent and that response will be sent.

class ThreadedTCPRequestHandler(SocketServer.BaseRequestHandler):

    def handle(self):
       es = Elasticsearch()
       #receive query from the client
       query = self.request.recv(1024)
       #Cut off the characters that aren't recognized
       query=query[2:]
       #for testing
       query=query.lower().strip().replace(' ','_')
       print query
       #Send response that query was received
       self.request.send("200...OK\n")
       res = es.search(index="painters",body={"query": { "match" :   {"title" : query}},"size":1  })
       if res['hits']['hits']:
           response = res['hits']['hits'][0]['_source']['text']
           self.request.send("201...RE\n")
       print response
       response=response.encode('utf-8')
       self.request.sendall(response)

On the android side I have two functions one for reading responses and one for reading bytes.

    private String getResponse(InputStream is){
        String line="";
        BufferedReader rd = new BufferedReader(new InputStreamReader(is),8);
        try{
            line=rd.readLine();
        }
        catch (Exception e){
            Toast.makeText(MainActivity.this, "Stream Exception", Toast.LENGTH_SHORT).show();
        }
        return line;
    }

    private String convertStreamToString(InputStream is) {
        BufferedInputStream bi = new BufferedInputStream(is);
        byte[] b = new byte[1024];
        StringBuilder total = new StringBuilder();
        try {
            while (bi.read(b,0,1024)!=-1)
            {
                total.append(decodeUTF8(b));
                Log.d("TOTAL",decodeUTF8(b));
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
        return total.toString();
    }

And here is the function that should decode the string:

String decodeUTF8(byte[] bytes) {
    return new String(bytes, UTF8_CHARSET);
}

Problem is Sometimes not the whole string is shown on the android Side, and when the whole thing goes through some UTF-8 Characters end up deformed (totally different character than sent)

AsyncTask post execute that starts new Activty:

    protected void onPostExecute(String s) {
        //super.onPostExecute(s);
        if (s.contains("ECONNREFUSED")){
            Toast.makeText(MainActivity.this,"Connection Failed",Toast.LENGTH_LONG).show();
            return;
        }
        Intent intent = new Intent(MainActivity.this,ReplyActivity.class);
        intent.putExtra(EXTRA_MESSAGE,s);
        startActivity(intent);
    }

New Intent getting the string:

 @Override
 protected void onCreate(Bundle savedInstanceState) {
     super.onCreate(savedInstanceState);

    //get message
    Intent intent = getIntent();
    String summary = intent.getStringExtra(MainActivity.EXTRA_MESSAGE);

Example ouput:

Early life (1928–1949)

Andy Warhol ("né" Andrej Varhola, Jr.) was born on August 6, 1928 in Pittsburgh, Pennsylvania. He was the fourth child of Andrij Warhola (Americanized as Andrew Warhola, Sr., 1889–1942) and Júlia ("née" Zavacká, 1892–1972), w

As you can see even when sending the query from android to python I get some crap that I need to cut off.

here:

       #Cut off the characters that aren't recognized
       query=query[2:]

repr(response):

<h2>Early life (1928\xe2\x80\x931949)</h2>\nAndy Warhol ("n\xc3\xa9" Andrej Varhola, Jr.) was born on August 6, 1928 in <a href="Pittsburgh">Pittsburgh</a>, Pennsylvania. He was the fourth child of Andrij Warhola (Americanized as Andrew Warhola, Sr., 1889\xe2\x80\x931942) and <a href="Julia Warhola">J\xc3\xbalia</a> ("n\xc3\xa9e" Zavack\xc3\xa1, 1892\xe2\x80\x931972), whose first child was born in their homeland and died before their move to the U.S.

Terminal print:

<h2>Early life (1928–1949)</h2>
Andy Warhol ("né" Andrej Varhola, Jr.) was born on August 6, 1928 in <a href="Pittsburgh">Pittsburgh</a>, Pennsylvania. He was the fourth child of Andrij Warhola (Americanized as Andrew Warhola, Sr., 1889–1942) and <a href="Julia Warhola">Júlia</a> ("née" Zavacká, 1892–1972), whose first child was born in their homeland and died before their move to the U.S.