可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a database where each object property is stored in a separate row. The attached query does not return distinct values in a redshift database but works as expected when testing in any mysql compatible database.

SELECT DISTINCT distinct_value 
FROM
( 
  SELECT
    uri,
    ( SELECT DISTINCT value_string 
      FROM `test_organization__app__testsegment` AS X 
      WHERE X.uri = parent.uri AND name = 'hasTestString' AND parent.value_string IS NOT NULL ) AS distinct_value 
  FROM `test_organization__app__testsegment` AS parent 
  WHERE     
    uri IN ( SELECT uri 
             FROM `test_organization__app__testsegment` 
             WHERE name = 'types' AND value_uri_multivalue = 'Document'
           )
) AS T 
WHERE distinct_value IS NOT NULL
ORDER BY distinct_value ASC
LIMIT 10000 OFFSET 0

回答1:

This is not a bug and behavior is intentional, though not straightforward.

In Redshift, you can declare constraints on the tables but Redshift doesn't enforce them, i.e. it allows duplicate values if you insert them. The only difference here is that when you run SELECT DISTINCT query against a column that doesn't have a primary key declared it will scan the whole column and get unique values, and if you run the same on a column that has primary key constraint it will just return the output without performing unique list filtering. This is how you can get duplicate entries if you insert them.

Why is this done? Redshift is optimized for large datasets and it's much faster to copy data if you don't need to check constraint validity for every row that you copy or insert. If you want you can declare a primary key constraint as a part of your data model but you will need to explicitly support it by removing duplicates or designing ETL in a way there are no such.

More information with specific examples in this Heap blog post Redshift Pitfalls And How To Avoid Them

回答2:

Perhaps You can solve this by using appropriate joins. for example i have duplicate values in table 1 and i want values of table 1 by joining it to table 2 and there is some logic behind joining two tables according to your conditions.

so i can do something like this!!

select distinct table1.col1 from table1 left outer join table2 on table1.col1 = table2.col1

this worked for me very well and i got unique values from table1 and could remove dublicates

redshift select distinct returns repeated values

问题:

回答1:

回答2:

收藏的人(0)

redshift select distinct returns repeated values

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮