I am using Amazon Redshift.
I have a column in that string is stored as comma separated like Private, Private, Private, Private, Private, Private, United Healthcare
. I want to remove the duplicates from it using query
, so the result should be Private, United Healthcare
. I found some solutions obviously from Stackoverflow and came to know it is possible using regular expressions.
Hence, I have tried using:
SELECT regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\1') AS insurances;
And
SELECT regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\g') AS insurances;
And also some other regular expressions but seems not working. Any solution?
Here is a User-Defined Function (UDF) for Amazon Redshift:
Testing it with:
Returns:
If the order of return values is important, then it would need some more specific code.
Alternative Option is to try Python UDF. Simple Python function dedupes the string and return correct version.
Try this way,
Alternative way
Checking http://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html both will fail with redshift, none of those converts
text
totext[]