可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have an application that does a lot work on S3, mostly downloading files from it. I am seeing a lot of these kind of errors and I'd like to know if this is something on my code or if the service is really unreliable like this.

The code I'm using to read from the S3 object stream is as follows:

public static final void write(InputStream stream, OutputStream output) {

  byte[] buffer = new byte[1024];

  int read = -1;

  try {

    while ((read = stream.read(buffer)) != -1) {
      output.write(buffer, 0, read);
    }

    stream.close();
    output.flush();
    output.close();
  } catch (IOException e) {
    throw new RuntimeException(e);
  }

}

This OutputStream is a new BufferedOutputStream( new FileOutputStream( file ) ). I am using the latest version of the Amazon S3 Java client and this call is retried four times before giving up. So, after trying this for 4 times it still fails.

Any hints or tips on how I could possibly improve this are appreciated.

回答1:

I just managed to overcome a very similar problem. In my case the exception I was getting was identical; it happened for larger files but not for small files, and it never happened at all while stepping through the debugger.

The root cause of the problem was that the AmazonS3Client object was getting garbage collected in the middle of the download, which caused the network connection to break. This happened because I was constructing a new AmazonS3Client object with every call to load a file, while the preferred use case is to create a long-lasting client object that survives across calls - or at least is guaranteed to be around during the entirety of the download. So, the simple remedy is to make sure a reference to the AmazonS3Client is kept around so that it doesn't get GC'd.

A link on the AWS forums that helped me is here: https://forums.aws.amazon.com/thread.jspa?threadID=83326

回答2:

The network is closing the connection, prior to the client getting all the data, for one reason or another, that's what is going on.

Part of any HTTP Request is the content length, Your code is getting the header, saying hey buddy, here's data, and its this much of it.. and then the connection is dropping before the client has read all of the data.. so its bombing out with the exception.

I'd look at your OS/NETWORK/JVM connection timeout settings (though JVM generally inherit from the OS in this situation). The key is to figure out what part of the network is causing the problem. Is it your computer level settings saying, nope not going to wait any longer for packets.. is it that you are using a non blocking read, which has a timeout setting in your code, where it is saying, hey, haven't gotten any data from the server since longer than I'm supposed to wait so I'm going to drop the connection and exception. etc etc etc.

Best bet is to low level snoop the packet traffic and trace backwards, to see where the connection drop is happening, or see if you can up timeouts in things you can control, like your software, and OS/JVM.

回答3:

First of all, your code is operating entirely normally if (and only if) you suffer connectivity troubles between yourself and Amazon S3. As Michael Slade points out, standard connection-level debugging advice applies.

As to your actual source code, I note a few code smells you should be aware of. Annotating them directly in the source:

public static final void write(InputStream stream, OutputStream output) {

  byte[] buffer = new byte[1024]; // !! Abstract 1024 into a constant to make 
                                  //  this easier to configure and understand.

  int read = -1;

  try {

    while ((read = stream.read(buffer)) != -1) {
      output.write(buffer, 0, read);
    }

    stream.close(); // !! Unexpected side effects: closing of your passed in 
                    //  InputStream. This may have unexpected results if your
                    //  stream type supports reset, and currently carries no 
                    //  visible documentation.

    output.flush(); // !! Violation of RAII. Refactor this into a finally block, 
    output.close(); //  a la Reference 1 (below).

  } catch (IOException e) {
    throw new RuntimeException(e); // !! Possibly indicative of an outer 
                                   //   try-catch block for RuntimeException. 
                                   //   Consider keeping this as IOException.
  }
}

(Reference 1)

Otherwise, the code itself seems fine. IO exceptions should be expected occurrences in situations where you're connecting to a fickle remote host, and your best course of action is to draft a sane policy to cache and reconnect in these scenarios.

回答4:

Try using wireshark to see what is happening on the wire when this happens.
Try temporarily replacing S3 with your own web server and see if the problem persists. If it does it's your code and not S3.

The fact that it's random suggests network issues between your host and some of the S3 hosts.

回答5:

Also S3 could close slow connections according to my experience.

回答6:

I would take a very close look at the network equipment nearest your client app. This problem smacks of some network device dropping packets between you and the service. Look to see if there was a starting point when the problem first occurred. Was there any change like a firmware update to a router or replacement of a switch around that time?

Verify your bandwidth usage against the amount purchased from your ISP. Are there times of the day where you're approaching that limit? Can you obtain graphs of your bandwidth usage? See if the premature terminations can be correlated with high-bandwidth usage, particularly if it approaches some known limit. Does the problem seem to pick on smaller files and on large files only when they're almost finished downloading? Purchasing more bandwidth from your ISP may fix the problem.