remove duplicates from comma separated string (Ama

I am using Amazon Redshift.

I have a column in that string is stored as comma separated like Private, Private, Private, Private, Private, Private, United Healthcare. I want to remove the duplicates from it using query, so the result should be Private, United Healthcare. I found some solutions obviously from Stackoverflow and came to know it is possible using regular expressions.

Hence, I have tried using:

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\1') AS insurances;

And

SELECT  regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\g') AS insurances;

And also some other regular expressions but seems not working. Any solution?

标签： sql amazon-redshift

3条回答

爷、活的狠高调

2楼-- · 2019-07-15 00:54

Here is a User-Defined Function (UDF) for Amazon Redshift:

CREATE FUNCTION f_uniquify (s text)
  RETURNS text
IMMUTABLE
AS $$
  -- Split string by comma-space, remove duplicates, convert back to comma-separated
  return ', '.join(set(s.split(', ')))
$$ LANGUAGE plpythonu;

Testing it with:

select f_uniquify('Private, Private, Private, Private, Private, Private, United Healthcare');

Returns:

United Healthcare, Private

If the order of return values is important, then it would need some more specific code.

0人赞添加讨论(0) 举报

你好瞎i

3楼-- · 2019-07-15 00:58

Alternative Option is to try Python UDF. Simple Python function dedupes the string and return correct version.

0人赞添加讨论(0) 举报

做自己的国王

4楼-- · 2019-07-15 01:05

Try this way,

SELECT  array_agg(DISTINCT insurances) 
FROM (SELECT  regexp_split_to_table('Private, Private, Private, Private, Private, Private, United Healthcare'
              , ',\s+') AS insurances) x;

Alternative way

SELECT DISTINCT UNNEST(regexp_split_to_array('Private, Private, Private, Private, Private, Private, United Healthcare', ',\s+')) AS insurances;

Checking http://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html both will fail with redshift, none of those converts text to text[]

0人赞添加讨论(0) 举报

remove duplicates from comma separated string (Ama

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间