Proper way to check for URL equality

2020-05-20 06:50发布

问题:

I have the following scenario:

URL u1 = new URL("http://www.yahoo.com/");
URL u2 = new URL("http://www.yahoo.com");

if (u1.equals(u2)) {
    System.out.println("yes");
}
if (u1.toURI().equals(u2.toURI())) {
    System.out.println("uri equality");
}
if (u1.toExternalForm().equals(u2.toExternalForm())) {
    System.out.println("external form equality");
}
if (u1.toURI().normalize().equals(u2.toURI().normalize())) {
    System.out.println("uri normalized equality");
}

None of these checks are succeeding. Only the path differs: u1 has a path of "/" while u2 has a path of "". Are these URLs pointing to the same resource and is there a way for me to check such a thing without opening a connection? Am I misunderstanding something fundamental about URLs?

EDIT I should state that a non hacky check is desired. Is it reasonable to say that empty path == / ? I was hoping to not have this kind of code

回答1:

From the 2007 JavaOne :

The second puzzle, aptly titled "More Joys of Sets" has the user create HashMap keys that consist or several URL objects. Again, most of the audience was unable to guess the correct answer.

The important thing the audience learned here is that the URL object's equals() method is, in effect, broken. In this case, two URL objects are equal if they resolve to the same IP address and port, not just if they have equal strings. However, Bloch and Pugh point out an even more severe Achilles' Heel: the equality behavior differs depending on if you're connected to the network, where virtual addresses can resolve to the same host, or if you're not on the net, where the resolve is a blocking operation. So, as far as lessons learned, they recommend:

Don't use URL; use URI instead. URI makes no attempt to compare addresses or ports. In addition, don't use URL as a Set element or a Map key.
For API designers, the equals() method should not depend on the environment. For example, in this case, equality should not change if a computer is connected to the Internet versus standalone.


From the URI equals documentation :

For two hierarchical URIs to be considered equal, their paths must be equal and their queries must either both be undefined or else be equal.

In your case, the two path are different. one is "/" the other is "".


According to the URI RFC §6.2.3:

Implementations may use scheme-specific rules, at further processing cost, to reduce the probability of false negatives. For example, because the "http" scheme makes use of an authority component, has a default port of "80", and defines an empty path to be equivalent to "/", the following four URIs are equivalent:

 http://example.com
 http://example.com/
 http://example.com:/
 http://example.com:80/

It seems that this implementation doesn't use scheme-specific rules.


Resources :

  • sun.com - Java Puzzlers Serves Up Brain Benders Galore
  • javadoc - URI.equals()
  • URI RFC


回答2:

Strictly speaking they are not equal. The optional trailing slash (/) is only a common usage but not a must. You could display different pages for

http://www.yahoo.com/foo/

and for

http://www.yahoo.com/foo

It's even possible for the one you provided I believe the HTTP header could skip that slash.



回答3:

You can always compare relative URLs with Path.equals-method

ex.

Paths.get("/user/login").equals(Paths.get("/user/login/")))

produce true

You can also use startsWith/endsWith-methods



标签: java url