How to count all combined occurrences in SQL?

2019-06-19 17:53发布

问题:

Is there any option to gain counted combinations of all elements in one SQL query, without using temp tables or procedures?

Consider these three tables:

  • products (id, product_name)

  • transactions (id, date)

  • transaction_has_product (id, product_id, transaction_id)

Sample data

  • products

    1   AAA
    2   BBB
    3   CCC
    
  • transactions

    1   some_date
    2   some_date
    
  • transaction_has_products

    1   1   1
    2   2   1
    3   3   1
    4   1   2
    5   2   2
    

Result should be:

AAA, BBB = 2   
AAA, CCC = 1   
BBB, CCC = 1   
AAA, BBB, CCC = 1

回答1:

Not easily because you have different numbers of matched products in the last row compared with the other rows. You might be able to do it with some sort of GROUP_CONCAT() operator (available in MySQL; implementable in other DBMS, such as Informix and probably PostgreSQL), but I'm not confident of that.

Pairwise Matching

SELECT p1.product_name AS name1, p2.product_name AS name2, COUNT(*)
  FROM (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p1
  JOIN (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p2
    ON p1.transaction_id = p2.transaction_id
   AND p1.product_name   < p2.product_name
 GROUP BY p1.name, p2.name;

Handling the triple match is non-trivial; extending it further than that is definitely rather difficult.



回答2:

If you know what all the products will be up front, you could do it by pivoting the data like this.

If you don't know what the products will be up front, you could build this query dynamically in a stored procedure. The practicality of either approach would break down if the number of products is large, but I think that would probably be true no matter how this requirement is accomplished.

select
    product_combination, 
    case product_combination
        when 'AAA, BBB' then aaa_bbb
        when 'AAA, CCC' then aaa_ccc
        when 'BBB, CCC' then bbb_ccc
        when 'AAA, BBB, CCC' then aaa_bbb_ccc
    end as number_of_transactions
from
(
    select 'AAA, BBB' as product_combination union all
    select 'AAA, CCC' union all
    select 'BBB, CCC' union all
    select 'AAA, BBB, CCC'
) as combination_list
cross join
(
    select
        sum(case when aaa = 1 and bbb = 1 then 1 else 0 end) as aaa_bbb,
        sum(case when aaa = 1 and ccc = 1 then 1 else 0 end) as aaa_ccc,
        sum(case when bbb = 1 and ccc = 1 then 1 else 0 end) as bbb_ccc,
        sum(case when aaa = 1 and bbb = 1 and ccc = 1 then 1 else 0 end) as aaa_bbb_ccc
    from
    (
        select
            count(case when a.product_name = 'AAA' then 1 else null end) as aaa,
            count(case when a.product_name = 'BBB' then 1 else null end) as bbb,
            count(case when a.product_name = 'CCC' then 1 else null end) as ccc,
            b.transaction_id
        from
            products a
        inner join
            transaction_has_products b
        on
            a.id = b.product_id
        group by
            b.transaction_id
    ) as product_matrix
) as combination_counts

Result:

product_combination  number_of_transactions
AAA, BBB             2
AAA, CCC             1
BBB, CCC             1
AAA, BBB, CCC        1



回答3:

Depending on how much control you have on the query you could do (this is TSQL might have to change for postgresql)

SELECT COUNT(*) FROM transactions t WHERE
(
     SELECT COUNT(DISTINCT tp.product) 
     FROM transaction_has_products tp 
     WHERE tp.[transaction_id] = t.id and tp.product IN (1, 2, 3)
) = 3

where (1,2,3) is the list of ID's you want to check for and the = 3 equals the amount of entries in the list.



回答4:

  1. generete all possible combinations. I supported myself with this: https://stackoverflow.com/a/9135162/2244766 (it's a little tricky, I don't fully understand the logic... but it works!)
  2. make a subquery where you aggregate products_in_transactions into arrays of products per transaction_id
  3. Join both of them using array inclusion operators

After above steps, you could get something like:

with all_combis as (
    with RECURSIVE y1 as (
            with x1 as (
                --select id from products
                select distinct product_id as a from transaction_has_products 
            )
            select array[a] as b ,a as c ,1 as d 
            from x1
            union all
            select b||a,a,d+1
            from x1
            join y1 on (a < c)
    )
    select *
    from y1
)
, grouped_transactions as (
  SELECT 
    array_agg(product_id) as products
  FROM transaction_has_products
  GROUP BY transaction_id
)
SELECT all_combis.b, count(*)
from all_combis
left JOIN grouped_transactions ON grouped_transactions.products @> all_combis.b 
--WHERE array_upper(b, 1) > 1 -- or whatever
GROUP BY all_combis.b
order by array_upper(b, 1) desc, count(*) desc

You can join your products table to replace product ids with their names - but I gues you'll get it from here. here's the fiddle (sqlfiddle is having a bad day today - so check this out on your db in case it throws some weird error like timeout or something like that)

GL, HF :D