remove duplicates from comma separated string (Ama

2019-07-15 00:29发布

I am using Amazon Redshift.

I have a column in that string is stored as comma separated like Private, Private, Private, Private, Private, Private, United Healthcare. I want to remove the duplicates from it using query, so the result should be Private, United Healthcare. I found some solutions obviously from Stackoverflow and came to know it is possible using regular expressions.

Hence, I have tried using:

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\1') AS insurances; 

And

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\g') AS insurances; 

And also some other regular expressions but seems not working. Any solution?

3条回答
爷、活的狠高调
2楼-- · 2019-07-15 00:54

Here is a User-Defined Function (UDF) for Amazon Redshift:

CREATE FUNCTION f_uniquify (s text)
  RETURNS text
IMMUTABLE
AS $$
  -- Split string by comma-space, remove duplicates, convert back to comma-separated
  return ', '.join(set(s.split(', ')))
$$ LANGUAGE plpythonu;

Testing it with:

select f_uniquify('Private, Private, Private, Private, Private, Private, United Healthcare');

Returns:

United Healthcare, Private

If the order of return values is important, then it would need some more specific code.

查看更多
你好瞎i
3楼-- · 2019-07-15 00:58

Alternative Option is to try Python UDF. Simple Python function dedupes the string and return correct version.

查看更多
做自己的国王
4楼-- · 2019-07-15 01:05

Try this way,

SELECT  array_agg(DISTINCT insurances) 
FROM (SELECT  regexp_split_to_table('Private, Private, Private, Private, Private, Private, United Healthcare'
              , ',\s+') AS insurances) x;

Alternative way

SELECT DISTINCT UNNEST(regexp_split_to_array('Private, Private, Private, Private, Private, Private, United Healthcare', ',\s+')) AS insurances;

Checking http://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html both will fail with redshift, none of those converts text to text[]

查看更多
登录 后发表回答