While I know that matching a street address will never be perfect I'm looking to create a couple of regex statements that will get close most of the time.
I'm trying to highlight an address. I sucks at regex and I've tried to get close but could someone help me understand how I can make this better?
string:
6 am - 11 pM , Palma Sola Elementary, 6806 Fifth Ave NW, Bradenton, FL 34209 Come find just near the dsfsd sa fsa fasdf asfsds 5001 west your momma doesn't live here my 2005 ford ranger,
Regex 1:
/\s+(\d{2,5}\s+)(?![a|p]m\b)(([a-zA-Z|\s+]{1,5}){1,2})?([\s|\,|.]+)?(([a-zA-Z|\s+]{1,30}){1,4})(court|ct|street|st|drive|dr|lane|ln|road|rd|blvd)([\s|\,|.|\;]+)?(([a-zA-Z|\s+]{1,30}){1,2})([\s|\,|.]+)?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|GU|HI|IA|ID|IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH|OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VI|VT|WA|WI|WV|WY)([\s|\,|.]+)?(\s+\d{5})?([\s|\,|.]+)/i
(Sometimes there's just a street and city, but no state or zip)
regex 2:
/\b(\d{2,5}\s+)(?![a|p]m\b)(NW|NE|SW|SE|north|south|west|east|n|e|s|w)?([\s|\,|.]+)?(([a-zA-Z|\s+]{1,30}){1,4})(court|ct|street|st|drive|dr|lane|ln|road|rd|blvd)/i
Fiddle with it: http://jsfiddle.net/isuelt/rMC6P/11/
US addresses are not a regular language, and cannot be matched by using regular expressions. They are helpful in some isolated cases, but in general, they will fail you, especially for input like that.
I used to work at an address verification company. In answer to your question, to "highlight an address" in a string of text, I recommend you try an extraction utility. There are a few out there and I suggest you look around, but here is ours using the input from your question --- as you can see, it found the address and validated it:
The API endpoint returns JSON which contains the start and end positions of each address, as well as plenty of information about each one. (See the CSV output at the bottom of the picture above.)
I commend you for braving those regular expressions you tried! Hopefully this is helpful.
I needed to do something similar for addresses like
800 SE 20 AVENUE #603, DEERFIELD BEACH
9801 NW 3 STREET APT 5, PLANTATION
11909 GLENMORE DRIVE #4-1, CORAL SPRINGS
This is the regex that I used
\s*([0-9]*)\s((NW|SW|SE|NE|S|N|E|W))?(.*)((NW|SW|SE|NE|S|N|E|W))?((#|APT|BSMT|BLDG|DEPT|FL|FRNT|HNGR|KEY|LBBY|LOT|LOWR|OFC|PH|PIER|REAR|RM|SIDE|SLIP|SPC|STOP|STE|TRLR|UNIT|UPPR|\,)[^,]*)(\,)([\s\w]*)\n
It returns separate groups for each part of the address (I did not need to parse state name for my case).
Try it out here
https://regex101.com/r/OsvOxn/3
Matt is right. Regex parsing is never going to be very accurate. You'll inevitably have a reasonable number of false positives and false negatives if you go down this dangerous road. However, if you're okay with that, I actually like to use a combination of two regexes - one for street named based schemes and one for city grid schemes:
Street Name System:
/\b\d{1,6} +.{2,25}\b(avenue|ave|court|ct|street|st|drive|dr|lane|ln|road|rd|blvd|plaza|parkway|pkwy)[.,]?(.{0,25} +\b\d{5}\b)?/ig
Grid System
/(\b( +)?\d{1,6} +(north|east|south|west|n|e|s|w)[,.]?){2}(.{0,25} +\b\d{5}\b)?\b/ig
Also note that if the address doesn't have a state and zipcode, you can basically forget about extracting any text that goes after the street moniker.
This works for me!
if(address.match(/^\s*\S+(?:\s+\S+){2}/)) {
console.log('good address!')
}