How can I turn a bunch of rows into aggregated col

2019-06-12 07:01发布

Here is the scenario: I have a table that records the user_id, the module_id, and the date/time the module was viewed.

eg.

Table: Log
------------------------------
User_ID  Module_ID   Date
------------------------------
1       red         2001-01-01
1       green       2001-01-02
1       blue        2001-01-03
2       green       2001-01-04
2       blue        2001-01-05
1       red         2001-01-06
1       blue        2001-01-07
3       blue        2001-01-08
3       green       2001-01-09
3       red         2001-01-10
3       green       2001-01-11
4       white       2001-01-12

I need to get a result set that has the user_id as the 1st column, and then a column for each module. The row data is then the user_id and the count of the number of times that user viewed each module.

eg.

---------------------------------
User_ID  red green   blue    white
---------------------------------
1       2   1       2       0
2       0   1       1       0
3       1   2       1       0
4       0   0       0       1

I was initially thinking that I could do this with PIVOT, but no dice; the database is a converted SQL Server 2000 DB that is running in SQL Server 2005. I'm not able to change the compatibility level, so pivot is out.

The other catch is that the modules will vary, and it isn't feasible to re-write the query every time a module is added or removed. This means that I can't hard-code in the modules because I don't know in advance which will and will not be installed.

How can I accomplish this?

4条回答
闹够了就滚
2楼-- · 2019-06-12 07:48
SELECT User_ID, MAX(red) AS red, MAX(green) AS green, MAX(blue) AS blue,
  MAX(white) AS white FROM
((SELECT User_ID, COUNT(Module_ID) AS red, 0 AS green, 0 AS blue,
  0 AS white
FROM log
WHERE Module_ID = 'red'
GROUP BY User_ID)
UNION
(SELECT User_ID, 0 AS red, COUNT(Module_ID) AS green, 0 AS blue,
  0 AS white
FROM log
WHERE Module_ID = 'green'
GROUP BY User_ID)
UNION
(SELECT User_ID, 0 AS red, 0 AS green, COUNT(Module_ID) AS blue,
  0 AS white
FROM log
WHERE Module_ID = 'blue'
GROUP BY User_ID)
UNION
(SELECT User_ID, 0 AS red, 0 AS green, 0 AS blue,
  COUNT(Module_ID) AS white
FROM log
WHERE Module_ID = 'white'
GROUP BY User_ID))
GROUP BY User_ID
ORDER BY User_ID
查看更多
Anthone
3楼-- · 2019-06-12 07:54

PIVOT can be simulated with CASE and GROUP BY

select
    [user_id],
    sum(case when [Module_ID] = 'red' then 1 else 0 end) as red,
    sum(case when [Module_ID] = 'green' then 1 else 0 end) as green,
    sum(case when [Module_ID] = 'blue' then 1 else 0 end) as blue,
    sum(case when [Module_ID] = 'white' then 1 else 0 end) as white
from [log]
group by
    [user_id]

Of course this doesn't work if the modules vary (as stated in the question) but then, PIVOT has the same problem.

Dynamically generating some sql overcomes this problem but this solution smells a bit!

declare @sql nvarchar(max)

set @sql = '
select
    [user_id],'

select @sql = @sql + '
    sum(case when [Module_ID] = ''' + replace([Module_ID], '''','''''') + ''' then 1 else 0 end) as [' + replace([Module_ID], '''','') + '],'
from (select distinct [Module_ID] from [log]) as moduleids

set @sql = substring(@sql,1,len(@sql)-1) + '
from [log]
group by
    [user_id]
'
print @sql
exec sp_executesql @sql

Note that this may be vulnerable to sql-injection if the module id data can't be trusted.

查看更多
倾城 Initia
4楼-- · 2019-06-12 07:54

Using MySQL I did this:

  1. Copied your data into Log_Table.sql
  2. create table Log (User_ID mediumint, Module_ID CHAR(5), dte CHAR(10));
  3. load data infile 'Log_Table.sql' INTO TABLE Log FIELDS TERMINATED BY ',';
  4. Pivot:

    select User_ID AS 'USER',  sum(case
    Module_ID WHEN 'red'   then 1 else 0
    END) AS 'red', 
    
    sum(case Module_ID WHEN 'green' then 1
    else 0 END) AS 'green', 
    
    sum(case Module_ID WHEN 'blue'  then 1
    else 0 END) AS 'blue', 
    
    sum(case Module_ID WHEN 'white' then 1
    else 0 END) AS 'white'
    
    from Log 
    
    Group By User_ID;
    
    > +------+------+-------+------+-------+ 
    > | USER | red  | green | blue | white |
    > +------+------+-------+------+-------+ 
    > |    1 |    2 |     1 |    2 |     0 |
    > |    2 |    0 |     1 |    1 |     0 |
    > |    3 |    1 |     2 |    1 |     0 |
    > |    4 |    0 |     0 |    0 |     1 |
    > +------+------+-------+------+-------+ 
    > 4 rows in set (0.00 sec)
    

Hope this helps.

查看更多
孤傲高冷的网名
5楼-- · 2019-06-12 08:00

I believe characteristic functions are what you want.

查看更多
登录 后发表回答