My Java standalone application gets a URL (which points to a file) from the user and I need to hit it and download it. The problem I am facing is that I am not able to encode the HTTP URL address properly...
Example:
URL: http://search.barnesandnoble.com/booksearch/first book.pdf
java.net.URLEncoder.encode(url.toString(), "ISO-8859-1");
returns me:
http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst+book.pdf
But, what I want is
http://search.barnesandnoble.com/booksearch/first%20book.pdf
(space replaced by %20)
I guess URLEncoder
is not designed to encode HTTP URLs... The JavaDoc says "Utility class for HTML form encoding"... Is there any other way to do this?
I read the previous answers to write my own method because I could not have something properly working using the solution of the previous answers, it looks good for me but if you can find URL that does not work with this, please let me know.
The java.net.URI class can help; in the documentation of URL you find
Use one of the constructors with more than one argument, like:
(the single-argument constructor of URI does NOT escape illegal characters)
Only illegal characters get escaped by above code - it does NOT escape non-ASCII characters (see fatih's comment).
The
toASCIIString
method can be used to get a String only with US-ASCII characters:For an URL with a query like
http://www.google.com/ig/api?weather=São Paulo
, use the 5-parameter version of the constructor:Yeah URL encoding is going to encode that string so that it would be passed properly in a url to a final destination. For example you could not have http://stackoverflow.com?url=http://yyy.com. UrlEncoding the parameter would fix that parameter value.
So i have two choices for you:
Do you have access to the path separate from the domain? If so you may be able to simply UrlEncode the path. However, if this is not the case then option 2 may be for you.
Get commons-httpclient-3.1. This has a class URIUtil:
System.out.println(URIUtil.encodePath("http://example.com/x y", "ISO-8859-1"));
This will output exactly what you are looking for, as it will only encode the path part of the URI.
FYI, you'll need commons-codec and commons-logging for this method to work at runtime.
I agree with Matt. Indeed, I've never seen it well explained in tutorials, but one matter is how to encode the URL path, and a very different one is how to encode the parameters which are appended to the URL (the query part, behind the "?" symbol). They use similar encoding, but not the same.
Specially for the encoding of the white space character. The URL path needs it to be encoded as %20, whereas the query part allows %20 and also the "+" sign. The best idea is to test it by ourselves against our Web server, using a Web browser.
For both cases, I ALWAYS would encode COMPONENT BY COMPONENT, never the whole string. Indeed URLEncoder allows that for the query part. For the path part you can use the class URI, although in this case it asks for the entire string, not a single component.
Anyway, I believe that the best way to avoid these problems is to use a personal non-conflictive design. How? For example, I never would name directories or parameters using other characters than a-Z, A-Z, 0-9 and _ . That way, the only need is to encode the value of every parameter, since it may come from an user input and the used characters are unknown.
I took the content above and changed it around a bit. I like positive logic first, and I thought a HashSet might give better performance than some other options, like searching through a String. Although, I'm not sure if the autoboxing penalty is worth it, but if the compiler optimizes for ASCII chars, then the cost of boxing will be low.
I develop a library that serves this purpose: galimatias. It parses URL the same way web browsers do. That is, if a URL works in a browser, it will be correctly parsed by galimatias.
In this case:
Will give you:
http://search.barnesandnoble.com/booksearch/first%20book.pdf
. Of course this is the simplest case, but it'll work with anything, way beyondjava.net.URI
.You can check it out at: https://github.com/smola/galimatias