Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder
and creating URI/URL objects but none of them come out quite right.
Here's a method you can use in your code to convert a url string and map of parameters to a valid encoded url string containing the query parameters.
Use the following standard Java solution (passes around 100 of the testcases provided by Web Plattform Tests):
0. Test if URL is already encoded. Replace '+' encoded spaces with '%20' encoded spaces.
1. Split URL into structural parts. Use
java.net.URL
for it.2. Encode each structural part properly!
3. Use
IDN.toASCII(putDomainNameHere)
to Punycode encode the host name!4. Use
java.net.URI.toASCIIString()
to percent-encode, NFC encoded unicode - (better would be NFKC!). For more info see: How to encode properly this URLPrints
Here are some examples that will also work properly
Guava 15 has now added a set of straightforward URL escapers.
Apache Http Components library provides a neat option for building and encoding query params -
With HttpComponents 4.x use - URLEncodedUtils
For HttpClient 3.x use - EncodingUtil
URLEncoder
should be the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character&
nor the parameter name-value separator character=
.Note that spaces in query parameters are represented by
+
, not%20
, which is legitimately valid. The%20
is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character?
), not in query string (the part after?
).Also note that there are two
encode()
methods. One without charset argument and another with. The one without charset argument is deprecated. Never use it and always specify the charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.See also:
You need to first create a URI like:
Then convert that Uri to ASCII string:
Now your url string is completely encoded first we did simple url encoding and then we converted it to ASCII String to make sure no character outside US-ASCII are remaining in string. This is exactly how browsers do.