LISTAGG in Oracle to return distinct values

2019-01-02 15:34发布

I am trying to use the LISTAGG function in Oracle. I would like to get only the distinct values for that column. Is there a way in which I can get only the distinct values without creating a function or a procedure?

  col1  col2 Created_by
   1     2     Smith 
   1     2     John 
   1     3     Ajay 
   1     4     Ram 
   1     5     Jack 

I need to select col1 and the LISTAGG of col2 (column 3 is not considered). When I do that, I get something like this as the result of LISTAGG: [2,2,3,4,5]

I need to remove the duplicate '2' here; I need only the distinct values of col2 against col1.

23条回答
长期被迫恋爱
2楼-- · 2019-01-02 16:08

I overcame this issue by grouping on the values first, then do another aggregation with the listagg. Something like this:

select a,b,listagg(c,',') within group(order by c) c, avg(d)
from (select a,b,c,avg(d)
      from   table
      group by (a,b,c))
group by (a,b)

only one full table access, relatively easy to expand to more complex queries

查看更多
心情的温度
3楼-- · 2019-01-02 16:08

If you want distinct values across MULTIPLE columns, want control over sort order, don't want to use an undocumented function that may disappear, and do not want more than one full table scan, you may find this construct useful:

with test_data as 
(
      select 'A' as col1, 'T_a1' as col2, '123' as col3 from dual
union select 'A', 'T_a1', '456' from dual
union select 'A', 'T_a1', '789' from dual
union select 'A', 'T_a2', '123' from dual
union select 'A', 'T_a2', '456' from dual
union select 'A', 'T_a2', '111' from dual
union select 'A', 'T_a3', '999' from dual
union select 'B', 'T_a1', '123' from dual
union select 'B', 'T_b1', '740' from dual
union select 'B', 'T_b1', '846' from dual
)
select col1
     , (select listagg(column_value, ',') within group (order by column_value desc) from table(collect_col2)) as col2s
     , (select listagg(column_value, ',') within group (order by column_value desc) from table(collect_col3)) as col3s
from 
(
select col1
     , collect(distinct col2) as collect_col2
     , collect(distinct col3) as collect_col3
from test_data
group by col1
);
查看更多
其实,你不懂
4楼-- · 2019-01-02 16:08

Has anyone thought of using a PARTITION BY clause? It worked for me in this query to get a list of application services and the access.

SELECT DISTINCT T.APP_SVC_ID, 
       LISTAGG(RTRIM(T.ACCESS_MODE), ',') WITHIN GROUP(ORDER BY T.ACCESS_MODE) OVER(PARTITION BY T.APP_SVC_ID) AS ACCESS_MODE 
  FROM APP_SVC_ACCESS_CNTL T 
 GROUP BY T.ACCESS_MODE, T.APP_SVC_ID

I had to cut out my where clause for NDA, but you get the idea.

查看更多
听够珍惜
5楼-- · 2019-01-02 16:09

You can do it via RegEx replacement. Here is an example:

-- Citations Per Year - Cited Publications main query. Includes list of unique associated core project numbers, ordered by core project number.
SELECT ptc.pmid AS pmid, ptc.pmc_id, ptc.pub_title AS pubtitle, ptc.author_list AS authorlist,
  ptc.pub_date AS pubdate,
  REGEXP_REPLACE( LISTAGG ( ppcc.admin_phs_org_code || 
    TO_CHAR(ppcc.serial_num,'FM000000'), ',') WITHIN GROUP (ORDER BY ppcc.admin_phs_org_code || 
    TO_CHAR(ppcc.serial_num,'FM000000')),
    '(^|,)(.+)(,\2)+', '\1\2')
  AS projectNum
FROM publication_total_citations ptc
  JOIN proj_paper_citation_counts ppcc
    ON ptc.pmid = ppcc.pmid
   AND ppcc.citation_year = 2013
  JOIN user_appls ua
    ON ppcc.admin_phs_org_code = ua.admin_phs_org_code
   AND ppcc.serial_num = ua.serial_num
   AND ua.login_id = 'EVANSF'
GROUP BY ptc.pmid, ptc.pmc_id, ptc.pub_title, ptc.author_list, ptc.pub_date
ORDER BY pmid;

Also posted here: Oracle - unique Listagg values

查看更多
与风俱净
6楼-- · 2019-01-02 16:10

Here's how to solve your issue.

select  
      regexp_replace(
    '2,2,2.1,3,3,3,3,4,4' 
     ,'([^,]+)(,\1)*(,|$)', '\1\3')

from dual

returns

2,2.1,3,4

ANSWER (see notes below):

select col1, 

regexp_replace(
    listagg(
     col2 , ',') within group (order by col2)  -- sorted
    ,'([^,]+)(,\1)*(,|$)', '\1\3') )
   from tableX
where rn = 1
group by col1; 

Note: The above will work in most cases - list should be sorted , you may have to trim all trailing and leading space depending on your data.

If you have a alot of items in a group > 20 or big string sizes you might run into oracle string size limit 'result of string concatenation is too long' So put a max number on the members in each group. This will only work if its ok to list only the first members. If you have very long variable strings this may not work. you will have to experiment.

select col1,

case 
    when count(col2) < 100 then 
       regexp_replace(
        listagg(col2, ',') within group (order by col2)
        ,'([^,]+)(,\1)*(,|$)', '\1\3')

    else
    'Too many entries to list...'
end

from sometable
where rn = 1
group by col1;

Another solution (not so simple) to hopefully avoid oracle string size limit - string size is limited to 4000. Thanks to this post here by user3465996

select col1  ,
    dbms_xmlgen.convert(  -- HTML decode
    dbms_lob.substr( -- limit size to 4000 chars
    ltrim( -- remove leading commas
    REGEXP_REPLACE(REPLACE(
         REPLACE(
           XMLAGG(
             XMLELEMENT("A",col2 )
               ORDER BY col2).getClobVal(),
             '<A>',','),
             '</A>',''),'([^,]+)(,\1)*(,|$)', '\1\3'),
                  ','), -- remove leading XML commas ltrim
                      4000,1) -- limit to 4000 string size
                      , 1)  -- HTML.decode
                       as col2
 from sometable
where rn = 1
group by col1;

some test cases - FYI

regexp_replace('2,2,2.1,3,3,4,4','([^,]+)(,\1)+', '\1')
-> 2.1,3,4 Fail
regexp_replace('2 ,2 ,2.1,3 ,3 ,4 ,4 ','([^,]+)(,\1)+', '\1')
-> 2 ,2.1,3,4 Success  - fixed length items

items contained within items eg. 2,21

regexp_replace('2.1,1','([^,]+)(,\1)+', '\1')
-> 2.1 Fail
regexp_replace('2 ,2 ,2.1,1 ,3 ,4 ,4 ','(^|,)(.+)(,\2)+', '\1\2')
-> 2 ,2.1,1 ,3 ,4  -- success - NEW regex
 regexp_replace('a,b,b,b,b,c','(^|,)(.+)(,\2)+', '\1\2')
-> a,b,b,c fail!

v3 - regex thank Igor! works all cases.

select  
regexp_replace('2,2,2.1,3,3,4,4','([^,]+)(,\1)*(,|$)', '\1\3') ,
---> 2,2.1,3,4 works
regexp_replace('2.1,1','([^,]+)(,\1)*(,|$)', '\1\3'),
--> 2.1,1 works
regexp_replace('a,b,b,b,b,c','([^,]+)(,\1)*(,|$)', '\1\3')
---> a,b,c works

from dual
查看更多
倾城一夜雪
7楼-- · 2019-01-02 16:11

One annoying aspect with LISTAGG is that if the total length of concatenated string exceeds 4000 characters( limit for VARCHAR2 in SQL ), the below error is thrown, which is difficult to manage in Oracle versions upto 12.1

ORA-01489: result of string concatenation is too long

A new feature added in 12cR2 is the ON OVERFLOW clause of LISTAGG. The query including this clause would look like:

SELECT pid, LISTAGG(Desc, ' ' on overflow truncate) WITHIN GROUP (ORDER BY seq) AS desc
FROM B GROUP BY pid;

The above will restrict the output to 4000 characters but will not throw the ORA-01489 error.

These are some of the additional options of ON OVERFLOW clause:

  • ON OVERFLOW TRUNCATE 'Contd..' : This will display 'Contd..' at the end of string (Default is ... )
  • ON OVERFLOW TRUNCATE '' : This will display the 4000 characters without any terminating string.
  • ON OVERFLOW TRUNCATE WITH COUNT : This will display the total number of characters at the end after the terminating characters. Eg:- '...(5512)'
  • ON OVERFLOW ERROR : If you expect the LISTAGG to fail with the ORA-01489 error ( Which is default anyway ).
查看更多
登录 后发表回答