I'd like to be able to identify patterns of the form
28°44'30"N., 33°12'36"E.
Here's what I have so far:
use utf8;
qr{
(?:
\d{1,3} \s* ° \s*
\d{1,2} \s* ' \s*
\d{1,2} \s* " \s*
[ENSW] \s* \.?
\s* ,? \s*
){2}
}x;
Needless to say, this doesn't match. Does it have anything to do with the extended characters (namely the degree symbol)? Or am I just screwing this up big time?
I'd also appreciate directions to CPAN
, if you know of something there that will solve my problem. I've looked at Regex::Common and Geo::Formatter, but none of these do what I want. Any ideas?
Update
It turns out that I needed to take out use utf8
when reading the coordinates from a file. If I manually initialize a variable with a coordinate, it would match fine, but as soon as I read that same line from a file, it wouldn't match. Taking out use utf8
solved that. I guess I don't really understand what utf8
is doing.
Try dropping the
use utf8
statement.The degree symbol corresponds to character value 0xB0 in my current encoding (whatever that is, but it ain't UTF8). 0xB0 is a "continuation byte" in UTF8; it is expected to by the second, third, or fourth character of a sequence that begins with something between 0xC2 and 0xF4. Using that string with
utf8
will give you an error.You forgot the
x
modifier on theqr
operator.The
?:
at the beginning of the regex makes it non-capturing, which is probably why the matches cannot be extracted or seen. Dropping it from the regex may be the solution.If all of the coordinates are fixed-format,
unpack
may be a better way of obtaining the desired values.If not, then modify the regex:
This:
works: