I believe the definition and implementation of Java's URI.resolve method is incompatible with RFC 3986 section 5.2.2. I understand that the Java API defines how that method works, and if it were changed now it would break existing apps, but my question is this: Can anyone confirm my understanding that this method is incompatible with RFC 3986?
I'm using the example from this question: java.net.URI resolve against only query string, which I will copy here:
I'm trying to build URI's using the JDK java.net.URI.
I want to append to an absolute URI object, a query (in String). In example:
URI base = new URI("http://example.com/something/more/long");
String queryString = "query=http://local:282/rand&action=aaaa";
URI query = new URI(null, null, null, queryString, null);
URI result = base.resolve(query);
Theory (or what I think) is that resolve should return:
http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
But what I got is:
http://example.com/something/more/?query=http://local:282/rand&action=aaaa
My understanding of RFC 3986 section 5.2.2 is that if the path of the relative URI is empty, then the entire path of the base URI is to be used:
if (R.path == "") then
T.path = Base.path;
if defined(R.query) then
T.query = R.query;
else
T.query = Base.query;
endif;
and only if a path is specified is the relative path to be merged against the base path:
else
if (R.path starts-with "/") then
T.path = remove_dot_segments(R.path);
else
T.path = merge(Base.path, R.path);
T.path = remove_dot_segments(T.path);
endif;
T.query = R.query;
endif;
but the Java implementation always does the merge, even if the path is empty:
String cp = (child.path == null) ? "" : child.path;
if ((cp.length() > 0) && (cp.charAt(0) == '/')) {
// 5.2 (5): Child path is absolute
ru.path = child.path;
} else {
// 5.2 (6): Resolve relative path
ru.path = resolvePath(base.path, cp, base.isAbsolute());
}
If my reading is correct, to get this behaviour from the RFC pseudocode, you could put a dot as the path in the relative URI, before the query string, which from my experience using relative URIs as links in web pages is what I would expect:
transform(Base="http://example.com/something/more/long", R=".?query")
=> T="http://example.com/something/more/?query"
But I would expect, in a web page, that a link on the page "http://example.com/something/more/long" to "?query" would go to "http://example.com/something/more/long?query", not "http://example.com/something/more/?query" - in other words, consistent with the RFC, but not with the Java implementation.
Is my reading of the RFC correct, and the Java method inconsistent with it, or am I missing something?
Yes, I agree that the URI.resolve(URI)
method is incompatible with RFC 3986. The original question, on its own, presents a fantastic amount of research that contributes to this conclusion. First, let's clear up any confusion.
As Raedwald explained (in a now deleted answer), there is a distinction between base paths that end or do not end with /
:
fizz
relative to /foo/bar
is: /foo/fizz
fizz
relative to /foo/bar/
is: /foo/bar/fizz
While correct, it's not a complete answer because the original question is not asking about a path (i.e. "fizz", above). Instead, the question is concerned with the separate query component of the relative URI reference. The URI class constructor used in the example code accepts five distinct String arguments, and all but the queryString
argument were passed as null
. (Note that Java accepts a null String as the path parameter and this logically results in an "empty" path component because "the path component is never undefined" though it "may be empty (zero length)".) This will be important later.
In an earlier comment, Sajan Chandran pointed out that the java.net.URI
class is documented to implement RFC 2396 and not the subject of the question, RFC 3986. The former was obsoleted by the latter in 2005. That the URI class Javadoc does not mention the newer RFC could be interpreted as more evidence of its incompatibility. Let's pile on some more:
JDK-6791060 is an open issue that suggests this class "should be updated for RFC 3986". A comment there warns that "RFC3986 is not completely backwards
compatible with 2396".
Previous attempts were made to update parts of the URI class to be compliant with RFC 3986, such as JDK-6348622, but were then rolled back for breaking backwards compatibility. (Also see this discussion on the JDK mailing list.)
Although the path "merge" logic sounds similar, as noted by SubOptimal, the pseudocode specified in the newer RFC does not match the actual implementation. In the pseudocode, when the relative URI's path is empty, then the resulting target path is copied as-is from the base URI. The "merge" logic is not executed under those conditions. Contrary to that specification, Java's URI implementation trims the base path after the last /
character, as observed in the question.
There are alternatives to the URI class, if you want RFC 3986 behavior. Java EE 6 implementations provide javax.ws.rs.core.UriBuilder
, which (in Jersey 1.18) seems to behave as you expected (see below). It at least claims awareness of the RFC as far as encoding different URI components is concerned.
Outside of J2EE, Spring 3.0 introduced UriUtils, specifically documented for "encoding and decoding based on RFC 3986". Spring 3.1 deprecated some of that functionality and introduced the UriComponentsBuilder, but it does not document adherence to any specific RFC, unfortunately.
Test program, demonstrating different behaviors:
import java.net.*;
import java.util.*;
import java.util.function.*;
import javax.ws.rs.core.UriBuilder; // using Jersey 1.18
public class StackOverflow22203111 {
private URI withResolveURI(URI base, String targetQuery) {
URI reference = queryOnlyURI(targetQuery);
return base.resolve(reference);
}
private URI withUriBuilderReplaceQuery(URI base, String targetQuery) {
UriBuilder builder = UriBuilder.fromUri(base);
return builder.replaceQuery(targetQuery).build();
}
private URI withUriBuilderMergeURI(URI base, String targetQuery) {
URI reference = queryOnlyURI(targetQuery);
UriBuilder builder = UriBuilder.fromUri(base);
return builder.uri(reference).build();
}
public static void main(String... args) throws Exception {
final URI base = new URI("http://example.com/something/more/long");
final String queryString = "query=http://local:282/rand&action=aaaa";
final String expected =
"http://example.com/something/more/long?query=http://local:282/rand&action=aaaa";
StackOverflow22203111 test = new StackOverflow22203111();
Map<String, BiFunction<URI, String, URI>> strategies = new LinkedHashMap<>();
strategies.put("URI.resolve(URI)", test::withResolveURI);
strategies.put("UriBuilder.replaceQuery(String)", test::withUriBuilderReplaceQuery);
strategies.put("UriBuilder.uri(URI)", test::withUriBuilderMergeURI);
strategies.forEach((name, method) -> {
System.out.println(name);
URI result = method.apply(base, queryString);
if (expected.equals(result.toString())) {
System.out.println(" MATCHES: " + result);
}
else {
System.out.println(" EXPECTED: " + expected);
System.out.println(" but WAS: " + result);
}
});
}
private URI queryOnlyURI(String queryString)
{
try {
String scheme = null;
String authority = null;
String path = null;
String fragment = null;
return new URI(scheme, authority, path, queryString, fragment);
}
catch (URISyntaxException syntaxError) {
throw new IllegalStateException("unexpected", syntaxError);
}
}
}
Outputs:
URI.resolve(URI)
EXPECTED: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
but WAS: http://example.com/something/more/?query=http://local:282/rand&action=aaaa
UriBuilder.replaceQuery(String)
MATCHES: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
UriBuilder.uri(URI)
MATCHES: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
for me there is no discrepancy. With the Java behaviour.
in RFC2396 5.2.6a
All but the last segment of the base URI's path component is copied to the buffer. In other words, any characters after the last (right-most) slash character, if any, are excluded.
in RFC3986 5.2.3
return a string consisting of the reference's path component appended to all but the last segment of the base URI's path (i.e., excluding any characters after the right-most /" in the base URI path, or excluding the entire base URI path if it does not contain any "/" characters).