Java's URL/URI doesn't resolve correctly l

2019-09-12 02:56发布

问题:

I'm trying to resolve a relative link that starts with a question mark ? using Java's URL or URI classes.

HTML example:

<a href="?test=xyz">Test XYZ</a>

Code examples (from Scala REPL):

import java.net._

scala> new URL(new URL("http://abc.com.br/index.php?hello=world"), "?test=xyz").toExternalForm()
res30: String = http://abc.com.br/?test=xyz

scala> (new URI("http://abc.com.br/index.php?hello=world")).resolve("?test=xyz").toString
res31: java.net.URI = http://abc.com.br/?test=xyz

The problem is that browsers (tested on Chrome, Firefox and Safari) output the following URL instead: http://abc.com.br/index.php?hello=world. It doesn't discard the path "index.php". It just replaces the query string part.

And it seems that browsers are just following the especification as explained in https://stackoverflow.com/a/7872230/40876.

Jsoup library makes the same "mistake" when we use element.absUrl("href") as it also depends on java's URL resolving.

So what's up with java's URL/URI resolving relative paths? Is it wrong/incomplete? How to make it behave the same as the browsers implementation?

回答1:

This will work just fine:

public static void main(String[] args) throws Exception {
    String base = "http://abc.com.br/index.php?hello=world";
    String relative = "?test=xyz";

    System.out.println(new URL(new URL(base), relative).toExternalForm());
    // http://abc.com.br/?test=xyz

    System.out.println((new URI(base)).resolve(relative).toString());
    // http://abc.com.br/?test=xyz

    System.out.println(org.apache.http.client.utils.URIUtils.resolve(new URI(base), relative).toString());
    // http://abc.com.br/index.php?test=xyz
}

URIUtils live in org.apache.httpcomponents:httpclient version 4.0 or higher.