I recently upgraded to MySQL 8 so that I can use the new Regular Expression functions (https://dev.mysql.com/doc/refman/8.0/en/regexp.html) to clean up a lot of bad addresses. However, I'm running into trouble using REGEXP_REPLACE.
I'm starting by running a SELECT query as an example (so I can see what's going on before I run an UPDATE query):
SELECT address1_raw, CONVERT(REGEXP_REPLACE (address1_raw, '^[0-9]+ ', '')
USING UTF8) as replaced,
CONVERT(REGEXP_SUBSTR(address1_raw, '^[0-9]+ ') USING UTF8) as captured
FROM members WHERE REGEXP_LIKE (address1_raw, '^[0-9]+ ');
The results I'm looking for are like this:
address1_raw | replaced | captured
54 Main St | Main St | 54
1089 High St | High St | 1089
67 Broadway | Broadway | 67
Instead, I'm getting this:
address1_raw | replaced | captured
54 Main St | Main St | 54
1089 High St | Main StHigh St | 1089
67 Broadway | Main StHigh StBroadway | 67
I'm totally stumped! Any idea what's going on here?