I am trying to download pictures from some urls. For some pictures it works fine, but for others I get 403 errors.
For exemple, this one: http://blog.zenika.com/themes/Zenika/img/zenika.gif
This picture access does not require any authentication. You can click yourself on the link and verify that it is available to your browser with a 200 status code.
The following code produces an exception: new java.net.URL(url).openStream()
. Same for org.apache.commons.io.FileUtils.copyURLToFile(new java.net.URL(url), tmp)
whichs uses the same openStream()
metho under the hood.
java.io.IOException: Server returned HTTP response code: 403 for URL: http://blog.zenika.com/themes/Zenika/img/zenika.gif
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626) ~[na:1.7.0_45]
at java.net.URL.openStream(URL.java:1037) ~[na:1.7.0_45]
at services.impl.DefaultStampleServiceComponent$RemoteImgUrlFilter$class.downloadAsTemporaryFile(DefaultStampleServiceComponent.scala:548) [classes/:na]
at services.impl.DefaultStampleServiceComponent$RemoteImgUrlFilter$class.services$impl$DefaultStampleServiceComponent$RemoteImgUrlFilter$$handleImageUrl(DefaultStampleServiceComponent.scala:523) [classes/:na]
I develop with Scala / Play Framework. I tried to use the built-in AsyncHttpClient.
// TODO it could be better to use itetarees on the GET call becase I think AHC load the whole body in memory
WS.url(url).get.flatMap { res =>
if (res.status >= 200 && res.status < 300) {
val bodyStream = res.getAHCResponse.getResponseBodyAsStream
val futureFile = TryUtils.tryToFuture(createTemporaryFile(bodyStream))
play.api.Logger.info(s"Successfully downloaded file $filename with status code ${res.status}")
futureFile
} else {
Future.failed(new RuntimeException(s"Download of file $filename returned status code ${res.status}"))
}
} recover {
case NonFatal(e) => throw new RuntimeException(s"Could not downloadAsTemporaryFile url=$url", e)
}
With this AHC code, it works fine. Can someone explain this behavior and why I got a 403 error with the URL.openStream()
method?
As mentioned, some hoster prevent this intrusion using some header like UserAgent :
This doesn't work :
This works :