How to store an Http Response that may contain bin

2020-02-26 09:13发布

问题:

As I described in a previous question, I have an assignment to write a proxy server. It partially works now, but I still have a problem with handling of gzipped information. I store the HttpResponse in a String, and it appears I can't do that with gzipped content. However, the headers are text which I need to parse, and they all come from the same InputStream. My question is, what do I have to do in order to correctly handle binary responses, while still parsing the headers as strings?

>> Please see the edit below before you look at the code.

Here's the Response class implementation:

public class Response {
    private String fullResponse = "";
    private BufferedReader reader;
    private boolean busy = true;
    private int responseCode;
    private CacheControl cacheControl;

    public Response(String input) {
        this(new ByteArrayInputStream(input.getBytes()));
    }

    public Response(InputStream input) {
        reader = new BufferedReader(new InputStreamReader(input));
        try {
            while (!reader.ready());//wait for initialization.

            String line;
            while ((line = reader.readLine()) != null) {
                fullResponse += "\r\n" + line;

                if (HttpPatterns.RESPONSE_CODE.matches(line)) {
                    responseCode = (Integer) HttpPatterns.RESPONSE_CODE.process(line);
                } else if (HttpPatterns.CACHE_CONTROL.matches(line)) {
                    cacheControl = (CacheControl) HttpPatterns.CACHE_CONTROL.process(line);
                }
            }
            reader.close();
            fullResponse = "\r\n" + fullResponse.trim() + "\r\n\r\n";
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
        busy = false;
    }

    public CacheControl getCacheControl() {
        return cacheControl;
    }

    public String getFullResponse() {
        return fullResponse;
    }

    public boolean isBusy() {
        return busy;
    }

    public int getResponseCode() {
        return responseCode;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result
                + ((fullResponse == null) ? 0 : fullResponse.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (!(obj instanceof Response))
            return false;
        Response other = (Response) obj;
        if (fullResponse == null) {
            if (other.fullResponse != null)
                return false;
        } else if (!fullResponse.equals(other.fullResponse))
            return false;
        return true;
    }

    @Override
    public String toString() {
        return "Response\n==============================\n" + fullResponse;
    }
}

And here's HttpPatterns:

public enum HttpPatterns {
    RESPONSE_CODE("^HTTP/1\\.1 (\\d+) .*$"),
    CACHE_CONTROL("^Cache-Control: (\\w+)$"),
    HOST("^Host: (\\w+)$"),
    REQUEST_HEADER("(GET|POST) ([^\\s]+) ([^\\s]+)$"),
    ACCEPT_ENCODING("^Accept-Encoding: .*$");

    private final Pattern pattern;

    HttpPatterns(String regex) {
        pattern = Pattern.compile(regex);
    }

    public boolean matches(String expression) {
        return pattern.matcher(expression).matches();
    }

    public Object process(String expression) {
        Matcher matcher = pattern.matcher(expression);
        if (!matcher.matches()) {
            throw new RuntimeException("Called `process`, but the expression doesn't match. Call `matches` first.");
        }

        if (this == RESPONSE_CODE) {
            return Integer.parseInt(matcher.group(1));
        } else if (this == CACHE_CONTROL) {
            return CacheControl.parseString(matcher.group(1));
        } else if (this == HOST) {
            return matcher.group(1);
        } else if (this == REQUEST_HEADER) {
            return new RequestHeader(RequestType.parseString(matcher.group(1)), matcher.group(2), matcher.group(3));
        } else { //never happens
            return null;
        }
    }


}

EDIT

I tried implementing according the suggestions, but it's not working and I'm becoming desperate. When I try to view an image I get the following message from the browser:

The image “http://www.google.com/images/logos/ps_logo2.png” cannot be displayed because it contains errors.

Here's the log:

Request
==============================

GET http://www.google.com/images/logos/ps_logo2.png HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Cookie: PREF=ID=31f95dd7f42dfc7d:TM=1303507626:LM=1303507626:S=D4kIZ6rGFrlOUWlm


Not Reading from the Cache!!!!
I am going to try to connect to: www.google.com at port 80
Connected.
Writing to the server's buffer...
flushed.
Getting a response...
Got a binary response!


contentLength = 26209; headers.length() = 312; responseLength = 12136; fullResponse length = 12136


Got a response!

Writing to the Cache!!!!
I am going to write the following response:

HTTP/1.1 200 OK
Content-Type: image/png
Last-Modified: Thu, 05 Aug 2010 22:54:44 GMT
Date: Wed, 04 May 2011 15:05:30 GMT
Expires: Wed, 04 May 2011 15:05:30 GMT
Cache-Control: private, max-age=31536000
X-Content-Type-Options: nosniff
Server: sffe
Content-Length: 26209
X-XSS-Protection: 1; mode=block

 Response body is binary and was truncated.
Finished with request!

Here's the new Response class:

public class Response {
    private String headers = "";
    private BufferedReader reader;
    private boolean busy = true;
    private int responseCode;
    private CacheControl cacheControl;
    private InputStream fullResponse;
    private ContentEncoding encoding = ContentEncoding.TEXT;
    private ContentType contentType = ContentType.TEXT;
    private int contentLength;

    public Response(String input) {
        this(new ByteArrayInputStream(input.getBytes()));
    }

    public Response(InputStream input) {

        ByteArrayOutputStream tempStream = new ByteArrayOutputStream();
        InputStreamReader inputReader = new InputStreamReader(input);
        try {
            while (!inputReader.ready());
            int responseLength = 0;
            while (inputReader.ready()) {
                tempStream.write(inputReader.read());
                responseLength++;
            }
            /*
             * Read the headers
             */
            reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(tempStream.toByteArray())));
            while (!reader.ready());//wait for initialization.

            String line;
            while ((line = reader.readLine()) != null) {
                headers += "\r\n" + line;

                if (HttpPatterns.RESPONSE_CODE.matches(line)) {
                    responseCode = (Integer) HttpPatterns.RESPONSE_CODE.process(line);
                } else if (HttpPatterns.CACHE_CONTROL.matches(line)) {
                    cacheControl = (CacheControl) HttpPatterns.CACHE_CONTROL.process(line);
                } else if (HttpPatterns.CONTENT_ENCODING.matches(line)) {
                    encoding = (ContentEncoding) HttpPatterns.CONTENT_ENCODING.process(line);
                } else if (HttpPatterns.CONTENT_TYPE.matches(line)) {
                    contentType = (ContentType) HttpPatterns.CONTENT_TYPE.process(line);
                } else if (HttpPatterns.CONTENT_LENGTH.matches(line)) {
                    contentLength = (Integer) HttpPatterns.CONTENT_LENGTH.process(line);
                } else if (line.isEmpty()) {
                    break;
                }
            }

            InputStreamReader streamReader = new InputStreamReader(new ByteArrayInputStream(tempStream.toByteArray()));
            while (!reader.ready());//wait for initialization.
            //Now let's get the rest
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            int counter = 0;
            while (streamReader.ready() && counter < (responseLength - contentLength)) {
                out.write((char) streamReader.read());
                counter++;
            }
            if (encoding == ContentEncoding.BINARY || contentType == ContentType.BINARY) {
                System.out.println("Got a binary response!");
                while (streamReader.ready()) {
                    out.write(streamReader.read());
                }
            } else {
                System.out.println("Got a text response!");
                while (streamReader.ready()) {
                    out.write((char) streamReader.read());
                }
            }
            fullResponse = new ByteArrayInputStream(out.toByteArray());

            System.out.println("\n\ncontentLength = " + contentLength + 
                    "; headers.length() = " + headers.length() + 
                    "; responseLength = " + responseLength + 
                    "; fullResponse length = " + out.toByteArray().length + "\n\n");

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
        busy = false;
    }

}

and here's the ProxyServer class:

class ProxyServer {
    public void start() {
        while (true) {
            Socket serverSocket;
            Socket clientSocket;
            OutputStreamWriter toClient;
            BufferedWriter toServer;
            try {
                //The client is meant to put data on the port, read the socket.
                clientSocket = listeningSocket.accept();
                Request request = new Request(clientSocket.getInputStream());
                //System.out.println("Accepted a request!\n" + request);
                while(request.busy);
                //Make a connection to a real proxy.
                //Host & Port - should be read from the request
                URL url = null;
                try {
                    url = new URL(request.getRequestURL());
                } catch (MalformedURLException e){
                    url = new URL("http:\\"+request.getRequestHost()+request.getRequestURL());
                }

                System.out.println(request);

                //remove entry from cache if needed
                if (!request.getCacheControl().equals(CacheControl.CACHE) && cache.containsRequest(request)) {
                    cache.remove(request);
                }

                Response response = null;

                if (request.getRequestType() == RequestType.GET && request.getCacheControl().equals(CacheControl.CACHE) && cache.containsRequest(request)) {
                    System.out.println("Reading from the Cache!!!!");
                    response = cache.get(request);
                } else {
                    System.out.println("Not Reading from the Cache!!!!");
                    //Get the response from the destination
                    int remotePort = (url.getPort() == -1) ? 80 : url.getPort();
                    System.out.println("I am going to try to connect to: " + url.getHost() + " at port " + remotePort);
                    serverSocket = new Socket(url.getHost(), remotePort);
                    System.out.println("Connected.");
                    serverSocket.setSoTimeout(50000);

                    //write to the server - keep it open.
                    System.out.println("Writing to the server's buffer...");
                    toServer = new BufferedWriter(new OutputStreamWriter(serverSocket.getOutputStream()));
                    toServer.write(request.getFullRequest());
                    toServer.flush();
                    System.out.println("flushed.");

                    System.out.println("Getting a response...");
                    response = new Response(serverSocket.getInputStream());
                    //System.out.println("Got a response!\n" + response);
                    System.out.println("Got a response!\n");
                    //wait for the response
                    while(response.isBusy());
                }

                if (request.getRequestType() == RequestType.GET && request.getCacheControl().equals(CacheControl.CACHE) && response.getResponseCode() == 200) {
                    System.out.println("Writing to the Cache!!!!");
                    cache.put(request, response);
                }
                else System.out.println("Not Writing to the Cache!!!!");
                response = filter.filter(response);

                // Return the response to the client
                toClient = new OutputStreamWriter(clientSocket.getOutputStream());
                System.out.println("I am going to write the following response:\n" + response);
                BufferedReader responseReader = new BufferedReader(new InputStreamReader(response.getFullResponse()));
                while (responseReader.ready()) {
                    toClient.write(responseReader.read());
                }
                toClient.flush();
                toClient.close();
                clientSocket.close();
                System.out.println("Finished with request!");

            } catch (IOException e) {
                e.printStackTrace();
                continue;
            }
        }
   }
}

I would appreciate any and all feedback/insight/suggestion regarding how to solve this, and would of course prefer some actual code.

回答1:

Store it in a byte array:

byte[] bufer = new byte[???];

A more detailed process:

  • Create a buffer large enough for the response header (and drop exception if it is bigger).
  • Read bytes to the buffer until you find \r\n\r\n in the buffer. You can write a helper function for example static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle)
  • When you encounter the end of header, create a strinform the first n bytes of the buffer. You can then use RegEx on this strng (also note that RegEx is not the best method to parse HTTPeaders).
  • Be prepared that the buffer will contain additional data after the header, which are the first bytes of the response body. You have to copy these bytes to the output stream or output file or output buffer.
  • Read the rest of the response body. (Until content-length is read or stream is closed).

Edit:

You are not following these steps I suggested. inputReader.ready() is a wrong way to detect the phases of the response. There is no guarantee that the header will be sent in a single burst.

I tried to write a schematics in code (except the arrayIndexOf) function.

InputStream is;

// Create a buffer large enough for the response header (and drop exception if it is bigger).
byte[] headEnd = {13, 10, 13, 10}; // \r \n \r \n
byte[] buffer = new byte[10 * 1024];
int length = 0;

// Read bytes to the buffer until you find `\r\n\r\n` in the buffer. 
int bytes = 0;
int pos;
while ((pos = arrayIndexOf(buffer, 0, length, headEnd)) == -1 && (bytes = is.read(buffer, length, buffer.length() - length)) > -1) {
    length += bytes;

    // buffer is full but have not found end siganture
    if (length == buffer.length())
        throw new RuntimeException("Response header too long");
}

// pos contains the starting index of the end signature (\r\n\r\n) so we add 4 bytes
pos += 4;

// When you encounter the end of header, create a strinform the first *n* bytes
String header = new String(buffer, 0, pos);

System.out.println(header);

// Be prepared that the buffer will contain additional data after the header
// ... so we process it
System.out.write(buffer, pos, length - pos);

// process the rest until connection is closed
while (bytes = is.read(buffer, 0, bufer.length())) {
    System.out.write(buffer, 0, bytes);
}

The arrayIndexOf method could look something like this: (there are probably faster versions)

public static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle) {
    for (int i=offset; i<offset+length-nedle.length(); i++) {
        boolean match = false;
        for (int j=0; j<needle.length(); j++) {
            match = haystack[i + j] == needle[j];
            if (!match)
                break;
        }
        if (match)
            return i;
    }
    return -1;
}


回答2:

You basically need to parse the response headers as text, and the rest as binary. It's slightly tricky to do so, as you can't just create an InputStreamReader around the stream - that will read more data than you want. You'll quite possibly need to read data into a byte array and then call Encoding.GetString manually. Alternatively, if you've read data into a byte array already you could always create a ByteArrayInputStream around that, then an InputStreamReader on top... but you'll need to work out how far the headers go before you get to the body of the response, which you should keep as binary data.



回答3:

Jersey — a high level web framework — may save your day. You don't have to manage gzip content, header, etc, yourself anymore.

The following code gets the image used for your example and save it to disk. Then it verifies the saved image is equal to the downloaded one:

import com.google.common.io.ByteStreams;
import com.google.common.io.Files;
import com.sun.jersey.api.client.Client;
import com.sun.jersey.api.client.ClientResponse;

@Test
public void test() throws IOException {
    String filename = "ps_logo2.png";
    String url = "http://www.google.com/images/logos/" + filename;
    File file = new File(filename);

    WebResource resource = Client.create().resource(url);
    ClientResponse response = resource.get(ClientResponse.class);
    InputStream stream = response.getEntityInputStream();
    byte[] bytes = ByteStreams.toByteArray(stream);
    Files.write(bytes, file);

    assertArrayEquals(bytes, Files.toByteArray(file));
}

You will need two maven dependencies to run it:

<dependency>
    <groupId>com.sun.jersey</groupId>
    <artifactId>jersey-client</artifactId>
    <version>1.6</version>
</dependency>
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>r08</version>
</dependency>


回答4:

After reading the headers with BufferedReader you'll need to detect if the Content-Encoding header is set to gzip. If it is, to read the body you'll have to switch to using the InputStream and wrap it with a GZIPInputStream to decode the body. The tricky part however is the fact that the BufferedReader will have buffered past the headers into the body and the underlying InputStream will be ahead of where you need it.

What you could do is wrap the initial InputStream with a BufferedInputStream and call mark() on it before you begin processing the headers. When you're done processing the headers call reset(). Then read that stream until you hit the empty line between headers and the body. Now wrap it with the GZIPInputStream to process the body.



回答5:

I had the same problem. I commented the line which adds the header accept gzip:

con.setRequestProperty("Accept-Encoding","gzip, deflate");

...and it worked!