I am trying to summarize the count based on the all possible combinations of variables. Here is an example data:
问题:
回答1:
For this sort of query using some of the built in aggregate tools is quite straight forward.
First off setup some sample data based on your sample image:
declare @Table1 as table
([id] int, [a] int, [b] int, [c] int)
;
INSERT INTO @Table1
([id], [a], [b], [c])
VALUES
(10001, 1, 3, 3),
(10002, 0, 0, 0),
(10003, 3, 6, 0),
(10004, 7, 0, 0),
(10005, 0, 0, 0)
;
Since you want the count of IDs for each possible combination of non zero attributes A, B, and C, the first step is eliminate the zeros and convert the non zero values to a single value we can summarize on, in this case I'll use the attributes name. After that it's a simple matter of performing the aggregate, using the CUBE
clause in the group by statement to generate the combinations. Lastly in the having clause prune out the unwanted summations. Mostly that's just ignoring the null values in the attributes, and optionally removing the grand summary (count of all rows)
with t1 as (
select case a when 0 then null else 'a' end a
, case b when 0 then null else 'b' end b
, case c when 0 then null else 'c' end c
, id
from @Table1
)
select a, b, c, count(id) cnt
from t1
group by cube(a,b,c)
having (a is not null or grouping(a) = 1) -- For each attribute
and (b is not null or grouping(b) = 1) -- only allow nulls as
and (c is not null or grouping(c) = 1) -- a result of grouping.
and grouping_id(a,b,c) <> 7 -- exclude the grand total
order by grouping_id(a,b,c);
Here are the results:
a b c cnt
1 a b c 1
2 a b NULL 2
3 a NULL c 1
4 a NULL NULL 3
5 NULL b c 1
6 NULL b NULL 2
7 NULL NULL c 1
And finally my original rextester link: http://rextester.com/YRJ10544
@lad2025 Here's a dynamic version (sorry my SQL Server skills aren't as strong as my Oracle skills, but it works). Just set the correct values for @Table and @col and it should work as long as all other columns are numeric attributes:
declare @sql varchar(max), @table varchar(30), @col varchar(30);
set @table = 'Table1';
set @col = 'id';
with x(object_id, column_id, name, names, proj, pred, max_col, cnt)
as (
select object_id, column_id, name, cast(name as varchar(max))
, cast('case '+name+' when 0 then null else '''+name+''' end '+name as varchar(4000))
, cast('('+name+' is not null or grouping('+name+') = 1)' as varchar(4000))
, (select max(column_id) from sys.columns m where m.object_id = c.object_id and m.name <>'ID')
, 1
from sys.columns c
where object_id = OBJECT_ID(@Table)
and column_id = (select min(column_id) from sys.columns m where m.object_id = c.object_id and m.name <> @col)
union all
select x.object_id, c.column_id, c.name, cast(x.names+', '+c.name as varchar(max))
, cast(proj+char(13)+char(10)+' , case '+c.name+' when 0 then null else '''+c.name+''' end '+c.name as varchar(4000))
, cast(pred+char(13)+char(10)+' and ('+c.name+' is not null or grouping('+c.name+') = 1)' as varchar(4000))
, max_col
, cnt+1
from x join sys.columns c on c.object_id = x.object_id and c.column_id = x.column_id+1
)
select @sql='with t1 as (
select '+proj+'
, '+@col+'
from '+@Table+'
)
select '+names+'
, count('+@col+') cnt
from t1
group by cube('+names+')
having '+pred+'
and grouping_id('+names+') <> '+cast(power(2,cnt)-1 as varchar(10))+'
order by grouping_id('+names+');'
from x where column_id = max_col;
select @sql sql;
exec (@sql);
Rextester
回答2:
Poshan:
As Robert stated, SUMMARY can be used to count combinations. A second SUMMARY can count the computed types. One difficulty is ignoring the combinations that involve a zero value. If they can be converted to missings the processing is much cleaner. Presuming zeros converted to missing, this code would count distinct combinations:
proc summary noprint data=have;
class v2-v4 s1;
output out=counts_eachCombo;
run;
proc summary noprint data=counts_eachCombo(rename=_type_=combo_type);
class combo_type;
output out=counts_eachClassType;
run;
You can see how the use of a CLASS variable in a combination determines the TYPE, and the class variables can be of mixed type (numeric, character)
A different 'home-grown' approach that does not use SUMMARY can use data step with LEXCOMB to compute each combination and SQL with into / separated to generate a SQL statement that will count each distinctly.
Note: The following code contains macro varListEval for resolving a SAS variable list to individual variable names.
%macro makeHave(n=,m=,maxval=&m*4,prob0=0.25);
data have;
do id = 1 to &n;
array v v1-v&m;
do over v;
if ranuni(123) < &prob0 then v = 0; else v = ceil(&maxval*ranuni(123));
end;
s1 = byte(65+5*ranuni(123));
output;
end;
run;
%mend;
%makeHave (n=100,m=5,maxval=15)
%macro varListEval (data=, var=);
%* resolve a SAS variable list to individual variable names;
%local dsid dsid2 i name num;
%let dsid = %sysfunc(open(&data));
%if &dsid %then %do;
%let dsid2 = %sysfunc(open(&data(keep=&var)));
%if &dsid2 %then %do;
%do i = 1 %to %sysfunc(attrn(&dsid,nvar));
%let name = %sysfunc(varname(&dsid,&i));
%let num = %sysfunc(varnum(&dsid2,&name));
%if &num %then "&NAME";
%end;
%let dsid2 = %sysfunc(close(&dsid2));
%end;
%let dsid = %sysfunc(close(&dsid));
%end;
%else
%put %sysfunc(sysmsg());
%mend;
%macro combosUCounts(data=, var=);
%local vars n;
%let vars = %varListEval(data=&data, var=&var);
%let n = %eval(1 + %sysfunc(count(&vars,%str(" ")));
* compute combination selectors and criteria;
data combos;
array _names (&n) $32 (&vars);
array _combos (&n) $32;
array _comboCriterias (&n) $200;
length _selector $32000;
length _criteria $32000;
if 0 then set &data; %* prep PDV for vname;
do _k = 1 to &n;
do _j = 1 to comb(&n,_k);
_rc = lexcomb(_j,_k, of _names[*]);
do _p = 1 to _k;
_combos(_p) = _names(_p);
if vtypex(_names(_p)) = 'C'
then _comboCriterias(_p) = trim(_names(_p)) || " is not null and " || trim(_names(_p)) || " ne ''";
else _comboCriterias(_p) = trim(_names(_p)) || " is not null and " || trim(_names(_p)) || " ne 0";
end;
_selector = catx(",", of _combos:);
_criteria = catx(" and ", of _comboCriterias:);
output;
end;
end;
stop;
run;
%local union;
proc sql noprint;
* generate SQL statement that uses combination selectors and criteria;
select "select "
|| quote(trim(_selector))
|| " as combo"
|| ", "
|| "count(*) as uCount from (select distinct "
|| trim(_selector)
|| " from &data where "
|| trim(_criteria)
|| ")"
into :union separated by " UNION "
from combos
;
* perform the generated SQL statement;
create table comboCounts as
&union;
/* %put union=%superq(union); */
quit;
%mend;
options mprint nosymbolgen;
%combosUCounts(data=have, var=v2-v4);
%combosUCounts(data=have, var=v2-v4 s1);
%put NOTE: Done;
/*
data _null_;
put %varListEval(data=have, var=v2-v4) ;
run;
*/
回答3:
Naive approach SQL Server
version (I've assumed that we always have 3 columns so there will be 2^3-1 rows):
SELECT 'A' AS combination, COUNT(DISTINCT CASE WHEN a > 0 THEN a ELSE NULL END) AS cnt FROM t
UNION ALL
SELECT 'B', COUNT(DISTINCT CASE WHEN b > 0 THEN a ELSE NULL END) FROM t
UNION ALL
SELECT 'C', COUNT(DISTINCT CASE WHEN c > 0 THEN a ELSE NULL END) FROM t
UNION ALL
SELECT 'A,B', COUNT(DISTINCT CASE WHEN a > 0 THEN CAST(a AS VARCHAR(10)) ELSE NULL END
+ ',' + CASE WHEN b > 0 THEN CAST(b AS VARCHAR(10)) ELSE NULL END) FROM t
UNION ALL
SELECT 'A,C', COUNT(DISTINCT CASE WHEN a > 0 THEN CAST(a AS VARCHAR(10)) ELSE NULL END
+ ',' + CASE WHEN c > 0 THEN CAST(c AS VARCHAR(10)) ELSE NULL END) FROM t
UNION ALL
SELECT 'B,C', COUNT(DISTINCT CASE WHEN b > 0 THEN CAST(b AS VARCHAR(10)) ELSE NULL END
+ ',' + CASE WHEN c > 0 THEN CAST(c AS VARCHAR(10)) ELSE NULL END) FROM t
UNION ALL
SELECT 'A,B,C', COUNT(DISTINCT CASE WHEN a > 0 THEN CAST(a AS VARCHAR(10)) ELSE NULL END
+ ',' + CASE WHEN b > 0 THEN CAST(b AS VARCHAR(10)) ELSE NULL END
+ ',' + CASE WHEN c > 0 THEN CAST(c AS VARCHAR(10)) ELSE NULL END ) FROM t
ORDER BY combination
Rextester Demo
EDIT:
Same as above but more concise:
WITH cte AS (
SELECT ID
,CAST(NULLIF(a,0) AS VARCHAR(10)) a
,CAST(NULLIF(b,0) AS VARCHAR(10)) b
,CAST(NULLIF(c,0) AS VARCHAR(10)) c
FROM t
)
SELECT 'A' AS combination, COUNT(DISTINCT a) AS cnt FROM cte UNION ALL
SELECT 'B', COUNT(DISTINCT b) FROM cte UNION ALL
SELECT 'C', COUNT(DISTINCT c) FROM cte UNION ALL
SELECT 'A,B', COUNT(DISTINCT a + ',' + b) FROM cte UNION ALL
SELECT 'A,C', COUNT(DISTINCT a + ',' + c) FROM cte UNION ALL
SELECT 'B,C', COUNT(DISTINCT b + ',' + c) FROM cte UNION ALL
SELECT 'A,B,C', COUNT(DISTINCT a + ',' + b + ',' + c ) FROM cte ;
Rextester Demo
EDIT 2
Using UNPIVOT
:
WITH cte AS (SELECT ID
,CAST(IIF(a!=0,1,NULL) AS VARCHAR(10)) a
,CAST(IIF(b!=0,1,NULL) AS VARCHAR(10)) b
,CAST(IIF(c!=0,1,NULL) AS VARCHAR(10)) c
FROM t)
SELECT combination, [count]
FROM (SELECT a=COUNT(a), b=COUNT(b), c=COUNT(c)
, ab=COUNT(a+b), ac=COUNT(a+c), bc=COUNT(b+c), abc=COUNT(a+b+c)
FROM cte) s
UNPIVOT ([count] FOR combination IN (a,b,c,ab,ac,bc,abc))AS unpvt
Rextester Demo
EDIT FINAL APPROACH
I appreciate your approach. I have more than 3 variables in my actual dataset and do you think we can generate all possible combinations programatically rather than the hard coding them! May be your second approach will cover that :
SQL is a bit clumsy to do this kind of operation, but I want to show it is possible.
CREATE TABLE t(id INT, a INT, b INT, c INT);
INSERT INTO t
SELECT 10001,1,3,3 UNION
SELECT 10002,0,0,0 UNION
SELECT 10003,3,6,0 UNION
SELECT 10004,7,0,0 UNION
SELECT 10005,0,0,0;
DECLARE @Sample AS TABLE
(
item_id tinyint IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
item nvarchar(500) NOT NULL,
bit_value AS CONVERT ( integer, POWER(2, item_id - 1) )
PERSISTED UNIQUE CLUSTERED
);
INSERT INTO @Sample
SELECT name
FROM sys.columns
WHERE object_id = OBJECT_ID('t')
AND name != 'id';
DECLARE @max integer = POWER(2, ( SELECT COUNT(*) FROM @Sample AS s)) - 1;
DECLARE @cols NVARCHAR(MAX);
DECLARE @cols_casted NVARCHAR(MAX);
DECLARE @cols_count NVARCHAR(MAX);
;WITH
Pass0 as (select 1 as C union all select 1), --2 rows
Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
Tally as (select row_number() over(order by C) as n from Pass4)
, cte AS (SELECT
combination =
STUFF
(
(
SELECT ',' + s.item
FROM @Sample AS s
WHERE
n.n & s.bit_value = s.bit_value
ORDER BY
s.bit_value
FOR XML
PATH (''),
TYPE
).value('(./text())[1]', 'varchar(8000)'), 1, 1, ''
)
FROM Tally AS N
WHERE N.n BETWEEN 1 AND @max
)
SELECT @cols = STRING_AGG(QUOTENAME(combination),',')
,@cols_count = STRING_AGG(FORMATMESSAGE('[%s]=COUNT(DISTINCT %s)'
,combination,REPLACE(combination, ',', ' + '','' +') ),',')
FROM cte;
SELECT
@cols_casted = STRING_AGG(FORMATMESSAGE('CAST(NULLIF(%s,0) AS VARCHAR(10)) %s'
,name, name), ',')
FROM sys.columns
WHERE object_id = OBJECT_ID('t')
AND name != 'id';
DECLARE @sql NVARCHAR(MAX);
SET @sql =
'SELECT combination, [count]
FROM (SELECT <cols_count>
FROM (SELECT ID, <cols_casted> FROM t )cte) s
UNPIVOT ([count] FOR combination IN (<cols>))AS unpvt';
SET @sql = REPLACE(@sql, '<cols_casted>', @cols_casted);
SET @sql = REPLACE(@sql, '<cols_count>', @cols_count);
SET @sql = REPLACE(@sql, '<cols>', @cols);
SELECT @sql;
EXEC (@sql);
DBFiddle Demo
DBFiddle Demo with 4 variables