Matching degree-based geographical coordinates wit

2019-02-25 12:42发布

问题:

I'd like to be able to identify patterns of the form

28°44'30"N., 33°12'36"E.

Here's what I have so far:

use utf8;
qr{
    (?:
    \d{1,3} \s*  °   \s*
    \d{1,2} \s*  '   \s*
    \d{1,2} \s*  "   \s*
    [ENSW]  \s* \.?
            \s*  ,?  \s*
    ){2}
}x;

Needless to say, this doesn't match. Does it have anything to do with the extended characters (namely the degree symbol)? Or am I just screwing this up big time?

I'd also appreciate directions to CPAN, if you know of something there that will solve my problem. I've looked at Regex::Common and Geo::Formatter, but none of these do what I want. Any ideas?

Update

It turns out that I needed to take out use utf8 when reading the coordinates from a file. If I manually initialize a variable with a coordinate, it would match fine, but as soon as I read that same line from a file, it wouldn't match. Taking out use utf8 solved that. I guess I don't really understand what utf8 is doing.

回答1:

Try dropping the use utf8 statement.

The degree symbol corresponds to character value 0xB0 in my current encoding (whatever that is, but it ain't UTF8). 0xB0 is a "continuation byte" in UTF8; it is expected to by the second, third, or fourth character of a sequence that begins with something between 0xC2 and 0xF4. Using that string with utf8 will give you an error.



回答2:

This:

use strict;
use warnings;
use utf8;
my $re = qr{
    (?:
    \d{1,3} \s*  °   \s*
    \d{1,2} \s*  '   \s*
    \d{1,2} \s*  "   \s*
    [ENSW]  \s* \.?
            \s*  ,?  \s*
    ){2}
}x;
if (q{28°44'30"N., 33°12'36"E.} =~ $re) {
    print "match\n";
} else {
    print "no match\n";
}

works:

$ ./coord.pl 
match


回答3:

You forgot the x modifier on the qr operator.



回答4:

The ?: at the beginning of the regex makes it non-capturing, which is probably why the matches cannot be extracted or seen. Dropping it from the regex may be the solution.

If all of the coordinates are fixed-format, unpack may be a better way of obtaining the desired values.

my @twoCoordinates = unpack 'A2xA2xA2xAx3A2xA2xA2xA', "28°44'30"N., 33°12'36"E.";

print "@twoCoordinates";  # returns '28 44 30 N 33 12 36 E'

If not, then modify the regex:

my @twoCoordinates = "28°44'30"N., 33°12'36"E." =~ /\w+/g;