可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to separate street names from street numbers which have these patterns:
- "street 12" --- name:street , number:12
- "street12" --- name:street , number:12
- "street 12a" --- name:street , number:12a
- "street12a" --- name:street , number:12a
What is the regex to get the street name, and the regex to get the street number in php and python?
Note: The number is always after the street name so I guess that should shorten it.
Thanks.
回答1:
Try this as see if it works for you:
$subjects = array( "street 12", "street12", "street 12a", "street12a" );
foreach( $subjects as $subject )
{
if ( preg_match('/([^\d]+)\s?(.+)/i', $subject, $result) )
{
var_dump( $result );
}
}
die_r( $result );
The only part you need is this:
// Find a match and store it in $result.
if ( preg_match('/([^\d]+)\s?(.+)/i', $subject, $result) )
{
// $result[1] will have the steet name
$streetName = $result[1];
// and $result[2] is the number part.
$streetNumber = $result[2];
}
回答2:
I would suggest that the best way to determine when the number starts is when you hit a digit. Thus, you would use
preg_match('/^([^\d]*[^\d\s]) *(\d.*)$/', $address, $match)
Examples:
'Bubbletown 145' => 'Bubbletown', '145'
'Circlet56a' => 'Circle', '56a'
'Bloomfield Avenue 68' => 'Bloomfield Avenue', '68'
'Quibbit Ave 999a' => 'Quibbit Ave', '999a'
'Singletown551abc' => 'Singletown', '551abc'
It will probably be best for you to consider how you want edge cases to be handled, then write a unit test to test your own Regex function.
回答3:
Generally speaking, addresses are not always this clean. Especially if this data is coming straight from users, you have to consider that not everyone has such a standard address. There are PO boxes, rural routes, 31 1/2
s, suites, tons of variations on street types (Road, Street, Circle, Court, etc, etc, plus all their abbreviations). Spaces in street names, hypens in house numbers, the complexity of addresses is very easy to underestimate. Mix in the potential for non-US addresses and the complexity goes up exponentially.
This giant function tries to make sense of all that (at least as far as the US Post is concerned): http://codepad.org/pkTdUDL6 I had this function kicking around, so it may need tweaking or elaboration. If nothing else, it should give you an idea of the task one is faced with when trying to make user address data sane.
This also makes it tempting to split the house number, street name, and street type into separate fields. If the accuracy of parsing addresses is critical to your system design, you might want to consider it; real estate systems for example would need to have this level of granularity for this data. If your use case does not critically rely on the ability to accurately parse this data, then I would not suggest presenting a user with all those extra fields. Just take their address as they give it, try to clean it up, and anticipate some inconsistencies in the rest of your system's design.
回答4:
Assuming that there can only be one final letter,
if (preg_match('/^(.+) *(\d+[a-z]?)$/', $address, $match)) {
list($street, $number) = $match;
}
回答5:
Parsing street addresses can get nasty, really fast. The most reliable, worry-free way is to use a service that can resolve the address components based on the full delivery point barcode (9-digit ZIP Code + 3-digit delivery point).
I work for an address verification company, SmartyStreets and we have an API that can parse these components for you. See this sample. Just a simple GET request and you've got a JSON result with all the address components parsed for you.
Update: SmartyStreets now provides international address verification.
回答6:
May be old, but referring the comment from Pekka I would use following regex in b01 code:
/(.+?)\s?([\d]+[\D]*)$/i
so full code would be
// Find a match and store it in $result.
if ( preg_match('/(.+?)\s?([\d]+[\D]*)$/i', $subject, $result) )
{
// $result[1] will have the steet name
$streetName = $result[1];
// and $result[2] is the number part.
$streetNumber = $result[2];
}
This selects the last occurring number including following chars (eg 15F/15 F) while still detecting streets including numbers (as 5th Ave 123, Straße des 17. Juni 123)