I use RegexBuddy while working with regular expressions. From its library I copied the regular expression to match URLs. I tested successfully within RegexBuddy. However, when I copied it as Java String
flavor and pasted it into Java code, it does not work. The following class prints false
:
public class RegexFoo {
public static void main(String[] args) {
String regex = "\\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]";
String text = "http://google.com";
System.out.println(IsMatch(text,regex));
}
private static boolean IsMatch(String s, String pattern) {
try {
Pattern patt = Pattern.compile(pattern);
Matcher matcher = patt.matcher(s);
return matcher.matches();
} catch (RuntimeException e) {
return false;
}
}
}
Does anyone know what I am doing wrong?
Try the following regex string instead. Your test was probably done in a case-sensitive manner. I have added the lowercase alphas as well as a proper string beginning placeholder.
This works too:
Note:
The problem with all suggested approaches: all RegEx is validating
All RegEx -based code is over-engineered: it will find only valid URLs! As a sample, it will ignore anything starting with "http://" and having non-ASCII characters inside.
Even more: I have encountered 1-2-seconds processing times (single-threaded, dedicated) with Java RegEx package (filtering Email addresses from text) for very small and simple sentences, nothing specific; possibly bug in Java 6 RegEx...
Simplest/Fastest solution would be to use StringTokenizer to split text into tokens, to remove tokens starting with "http://" etc., and to concatenate tokens into text again.
If you want to filter Emails from text (because later on you will do NLP staff etc) - just remove all tokens containing "@" inside.
This is simple text where RegEx of Java 6 fails. Try it in divverent variants of Java. It takes about 1000 milliseconds per RegEx call, in a long running single threaded test application:
Do not rely on regular expressions if you only need to filter words with "@", "http://", "ftp://", "mailto:"; it is huge engineering overhead.
If you really want to use RegEx with Java, try Automaton
This works too:
Note:
So probably the first one is more useful for general use.
The best way to do it now is:
EDIT: Code of
Patterns
from https://github.com/android/platform_frameworks_base/blob/master/core/java/android/util/Patterns.java :I'll try a standard "Why are you doing it this way?" answer... Do you know about
java.net.URL
?The above will throw a
MalformedURLException
if it can't parse the URL.In line with billjamesdev answer, here is another approach to validate an URL without using a RegEx:
From Apache Commons Validator lib, look at class UrlValidator. Some example code:
Construct a UrlValidator with valid schemes of "http", and "https".
If instead the default constructor is used.
prints out "url is valid"