split string into several rows

2020-01-29 17:43发布

I have a table with a string which contains several delimited values, e.g. a;b;c.

I need to split this string and use its values in a query. For example I have following table:

str
a;b;c
b;c;d
a;c;d

I need to group by a single value from str column to get following result:

str count(*)
a   1
b   2
c   3
d   2

Is it possible to implement using single select query? I can not create temporary tables to extract values there and query against that temporary table.

3条回答
淡お忘
2楼-- · 2020-01-29 18:28

This is ugly, but seems to work. The problem with the CONNECT BY splitting is that it returns duplicate rows. I managed to get rid of them, but you'll have to test it:

WITH
  data AS (
    SELECT 'a;b;c' AS val FROM dual
    UNION ALL SELECT 'b;c;d' AS val FROM dual
    UNION ALL SELECT 'a;c;d' AS val FROM dual
  )
SELECT token, COUNT(1)
  FROM (
    SELECT DISTINCT token, lvl, val, p_val
      FROM (
        SELECT
            regexp_substr(val, '[^;]+', 1, level) AS token,
            level AS lvl,
            val,
            NVL(prior val, val) p_val
          FROM data
        CONNECT BY regexp_substr(val, '[^;]+', 1, level) IS NOT NULL
      )
    WHERE val = p_val
  )
GROUP BY token;
TOKEN                  COUNT(1)
-------------------- ----------
d                             2 
b                             2 
a                             2 
c                             3 
查看更多
Luminary・发光体
3楼-- · 2020-01-29 18:30
SELECT NAME,COUNT(NAME) FROM ( SELECT NAME  FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL )  NAME
       FROM dual  CONNECT BY REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL) IS NOT NULL))
       UNION ALL  (SELECT NAME  FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL )  NAME
       FROM dual  CONNECT BY REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))
       UNION ALL 
         (SELECT NAME  FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL )  NAME
       FROM dual  CONNECT BY REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))) GROUP BY NAME

NAME  COUNT(NAME)
----- -----------
d               2
a               2
b               2
c               3
查看更多
▲ chillily
4楼-- · 2020-01-29 18:33

From your comment to @PrzemyslawKruglej answer

Main problem is with internal query with connect by, it generates astonishing amount of rows

The amount of rows generated can be reduced with the following approach:

/* test table populated with sample data from your question */
SQL> create table t1(str) as(
  2    select 'a;b;c'  from dual union all
  3    select 'b;c;d'  from dual union all
  4    select 'a;c;d'  from dual
  5  );
Table created

--  number of rows generated will solely depend on the most longest 
--  string. 
--  If (say) the longest string contains 3 words (wont count separator `;`)
--  and we have 100 rows in our table, then we will end up with 300 rows 
--  for further processing , no more.
with occurrence(ocr) as( 
  select level 
    from ( select max(regexp_count(str, '[^;]+')) as mx_t
             from t1 ) t
    connect by level <= mx_t 
)
select count(regexp_substr(t1.str, '[^;]+', 1, o.ocr)) as generated_for_3_rows
  from t1
 cross join occurrence o;

Result: For three rows where the longest one is made up of three words, we will generate 9 rows:

GENERATED_FOR_3_ROWS
--------------------
                  9

Final query:

with occurrence(ocr) as( 
  select level 
    from ( select max(regexp_count(str, '[^;]+')) as mx_t
             from t1 ) t
    connect by level <= mx_t 
)
select res
     , count(res) as cnt
  from (select regexp_substr(t1.str, '[^;]+', 1, o.ocr) as res
          from t1
         cross join occurrence o)
 where res is not null
 group by res
 order by res;

Result:

RES          CNT
----- ----------
a              2
b              2
c              3
d              2

SQLFIddle Demo

Find out more about regexp_count()(11g and up) and regexp_substr() regular expression functions.

Note: Regular expression functions relatively expensive to compute, and when it comes to processing a very large amount of data, it might be worth considering to switch to a plain PL/SQL. Here is an example.

查看更多
登录 后发表回答