Where is a good Address Parser [closed]

2020-01-29 05:00发布

I'm looking for a good tool that can take a full mailing address, formatted for display or use with a mailing label, and convert it into a structured object.

So for instance:

// Start with a formatted address in a single string
string f = "18698 E. Main Street\r\nBig Town, AZ, 86011";

// Parse into address
Address addr = new Address(f);

addr.Street; // 18698 E. Main Street
addr.Locality; // Big Town
addr.Region; // AZ
addr.PostalCode; // 86011

Now I could do this using RegEx. But the tricky part is keeping it general enough to handle any address in the world!

I'm sure there has to be something out there that can do it.

If anyone noticed, this is actually the format of the opensocial.address object.

7条回答
成全新的幸福
2楼-- · 2020-01-29 05:28

The Googlemaps API works pretty well for this. E.g., suppose you are given the string "120 w 45 st nyc". Pass it into the Googlemaps API like so: http://maps.google.com/maps/geo?q=120+w+45+st+nyc and you get this response:

{
  "name": "120 w 45 st nyc",
  "Status": {
    "code": 200,
    "request": "geocode"
  },
  "Placemark": [ {
    "id": "p1",
    "address": "120 W 45th St, New York, NY 10036, USA",
    "AddressDetails": {"Country": {"CountryNameCode": "US","CountryName": "USA","AdministrativeArea": {"AdministrativeAreaName": "NY","Locality": {"LocalityName": "New York","Thoroughfare":{"ThoroughfareName": "120 W 45th St"},"PostalCode": {"PostalCodeNumber": "10036"}}}},"Accuracy": 8},
    "ExtendedData": {
      "LatLonBox": {
        "north": 40.7603883,
        "south": 40.7540931,
        "east": -73.9807141,
        "west": -73.9870093
      }
    },
    "Point": {
      "coordinates": [ -73.9838617, 40.7572407, 0 ]
    }
  } ]
}
查看更多
时光不老,我们不散
3楼-- · 2020-01-29 05:32

You could try Experian Address Verification. Has it's issues but pretty much works as advertised.

查看更多
男人必须洒脱
4楼-- · 2020-01-29 05:32

I tried RecogniContact recently. It is a Windows COM component that parses US and European addresses. You can test it from the website.

http://www.loquisoft.com/index.php?page=8

查看更多
smile是对你的礼貌
5楼-- · 2020-01-29 05:40

As has been mentioned, this is not a trivial problem. One of the biggest issues--apart from international addresses--is that there is no standard format for addresses and the fact that an address can't tell you if it's well-formed, i.e. it's not self-validating like a credit card number.

Because of this, you have to rely on an external source of truth to ensure the address is real. This is where an address verification service comes into the mix. Depending upon your business needs and application requirements, you may be looking at a one-time "batch" scrub of your address list, or perhaps a realtime/live address validation service. There are a number of good providers (which vary in cost) that can easily solve this problem.

I should mention that I'm the founder of SmartyStreets. We do CASS-certified address verification. We'll take your unformatted/raw addresses and turn them into addresses which have been cleaned, standardized, and verified/confirmed. Depending on the size of your list, the cost is usually only a few dollars and the turnaround time is nearly instant--usually a few minutes.

查看更多
叼着烟拽天下
6楼-- · 2020-01-29 05:42

If you are looking for a address parser with a simple solution, try this:

http://usaddress.codeplex.com/

Good: 1. No database required 2. No internet lookup required 3. Pretty accurate

Bad: 1. Can not confirm if it is a real address 2. Only works for US address 3. in C#, use .NET 3.5 or above

查看更多
ゆ 、 Hurt°
7楼-- · 2020-01-29 05:48

As there is no trivial solution like @duffymo said, the next best thing might be to reconsider the design. If it's a user form, make a compromise and let the user fill it. If you are retroactively parsing data, then use a very strict regex to parse addresses based on some criteria (country is US). Then make a second pass at the ones that are left over and so on. I have taken this approach and it's the only reliable approach.

Another design problem with taking a generic regex approach is that it will generate false positive for bad addresses. If you are sending out snail mail to these people, it will end up bouncing and you'll have more work at your hands trying to sort out which ones came back or continue to send mails to erroneous addresses.

查看更多
登录 后发表回答