I'm looking for a good tool that can take a full mailing address, formatted for display or use with a mailing label, and convert it into a structured object.
So for instance:
// Start with a formatted address in a single string
string f = "18698 E. Main Street\r\nBig Town, AZ, 86011";
// Parse into address
Address addr = new Address(f);
addr.Street; // 18698 E. Main Street
addr.Locality; // Big Town
addr.Region; // AZ
addr.PostalCode; // 86011
Now I could do this using RegEx. But the tricky part is keeping it general enough to handle any address in the world!
I'm sure there has to be something out there that can do it.
If anyone noticed, this is actually the format of the opensocial.address object.
The Googlemaps API works pretty well for this. E.g., suppose you are given the string "120 w 45 st nyc". Pass it into the Googlemaps API like so:
http://maps.google.com/maps/geo?q=120+w+45+st+nyc
and you get this response:You could try Experian Address Verification. Has it's issues but pretty much works as advertised.
I tried RecogniContact recently. It is a Windows COM component that parses US and European addresses. You can test it from the website.
http://www.loquisoft.com/index.php?page=8
As has been mentioned, this is not a trivial problem. One of the biggest issues--apart from international addresses--is that there is no standard format for addresses and the fact that an address can't tell you if it's well-formed, i.e. it's not self-validating like a credit card number.
Because of this, you have to rely on an external source of truth to ensure the address is real. This is where an address verification service comes into the mix. Depending upon your business needs and application requirements, you may be looking at a one-time "batch" scrub of your address list, or perhaps a realtime/live address validation service. There are a number of good providers (which vary in cost) that can easily solve this problem.
I should mention that I'm the founder of SmartyStreets. We do CASS-certified address verification. We'll take your unformatted/raw addresses and turn them into addresses which have been cleaned, standardized, and verified/confirmed. Depending on the size of your list, the cost is usually only a few dollars and the turnaround time is nearly instant--usually a few minutes.
If you are looking for a address parser with a simple solution, try this:
http://usaddress.codeplex.com/
Good: 1. No database required 2. No internet lookup required 3. Pretty accurate
Bad: 1. Can not confirm if it is a real address 2. Only works for US address 3. in C#, use .NET 3.5 or above
As there is no trivial solution like @duffymo said, the next best thing might be to reconsider the design. If it's a user form, make a compromise and let the user fill it. If you are retroactively parsing data, then use a very strict regex to parse addresses based on some criteria (country is US). Then make a second pass at the ones that are left over and so on. I have taken this approach and it's the only reliable approach.
Another design problem with taking a generic regex approach is that it will generate false positive for bad addresses. If you are sending out snail mail to these people, it will end up bouncing and you'll have more work at your hands trying to sort out which ones came back or continue to send mails to erroneous addresses.