i want to create a frequency distribution matrix
1.Create a matrix.**Is it possible to get this in separate columns**
customer1 p1 p2 p3
customer 2 p2 p3
customer 3 p2 p3 p1
customer 4 p2 p1
2. Then I have to count the number of products that come together the most
For eg
p2 and p3 comes together 3 times
p1 p3 comes 2 times
p1 p2 comes 2 times
I want to recommend products to customers ,frequency of products that comes together
select customerId,product,count(*) from sales group by customerId,product
Can anyone please help me for a solution to this
If you want pairs of products that customers purchase, then you can use a self join:
select s1.product, s2.product, count(*) as cnt
from sales s1 join
sales s2
on s1.customerId = s2.customerId
where s1.product < s2.product
group by s1.product, s2.product
order by cnt desc;
You can extend this to more than two products by using more joins.
1.
Array
select customerId
,sort_array(collect_set (product)) as products
from sales
group by customerId
;
Multiple columns
select customerId
,products[0] as p0
,products[1] as p1
,products[2] as p2
,products[3] as p3
,products[4] as p4
,products[5] as p5
,products[6] as p6
,products[7] as p7
,products[8] as p8
,products[9] as p9
from (select customerId
,sort_array(collect_set (product)) as products
from sales
group by customerId
) s
;
2.
This returns the frequency of the whole products' combinations.
In your example (p1,p2,p3)
is the most frequent (appears twice).
(p1,p2)
appears once and so is (p2,p3)
.
For frequency of tuples, see @GordonLinoff answer.
select s.products
,count(*) as frequency
from (select customerId
,sort_array(collect_set (product)) as products
from sales
group by customerId
) s
group by s.products
order by frequency desc
I know cursors are not the flavor of the month anymore - but I still find them very useful from time to time
Below code creates a table variable and then loops through the sales data to find out which combination of products are sold together most frequently
--Results will be loaded into this table
DECLARE @OutputCombination TABLE (CombinationName VARCHAR(MAX), CombinationsFound INT)
--Just for demo - create table and load with data
declare @demodata table (ClientID int, ProductBought varchar(50))
insert into @demodata (Clientid, ProductBought) values
(1,'Coke'),
(1,'Chips'),
(1,'Chips'),
(2,'Coke'),
(2,'Coke'),
(2,'Sweets'),
(3,'Coke'),
(4,'Chips'),
(5,'Coke'),
(5,'Chips'),
(6,'Coke'),
(7,'Coke')
DECLARE clients CURSOR
READ_ONLY
FOR SELECT DISTINCT clientID from @demodata
DECLARE @clientID INT
OPEN clients
FETCH NEXT FROM clients INTO @clientID
WHILE (@@fetch_status <> -1)
BEGIN
DECLARE @ThisClientCombination VARCHAR(MAX) --This is going to be used to find/update combinations of products sold from the data - pipe-delimiting them
SET @ThisClientCombination = '' --better to actually wipe variables in cursor loops, as cursors sometimes funny about things like that
;WITH thisClientsDistinctCoverages AS --CTE used because of the SQL-funnies caused when using DISTINCT and ORDER BY
(
SELECT DISTINCT TOP 100 PERCENT CoverageItems = ProductBought + '|' FROM @demodata WHERE ClientID = @clientID ORDER BY ProductBought + '|' --order alphabetically and do DISTINCT so that 2x cokes are changed into just 1 coke - for the sake of combos
)
SELECT @ThisClientCombination = @ThisClientCombination + CoverageItems FROM thisClientsDistinctCoverages
SET @ThisClientCombination = LEFT(@ThisClientCombination,LEN(@ThisClientCombination)-1) --Drop trailing Pipe '|'
IF EXISTS(SELECT * FROM @OutputCombination WHERE CombinationName = @ThisClientCombination)
UPDATE @OutputCombination SET CombinationsFound = CombinationsFound + 1 WHERE CombinationName = @ThisClientCombination --found before, increase count by 1
ELSE
INSERT INTO @OutputCombination (CombinationName, CombinationsFound) VALUES (@ThisClientCombination, 1)--first time found, make entry 1
FETCH NEXT FROM clients INTO @clientID
END
CLOSE clients
DEALLOCATE clients
--Show the results
SELECT * FROM @OutputCombination ORDER BY CombinationsFound DESC, CombinationName
This will yield the results:
CombinationName---------CombinationsFound
Coke--------------------------3
Chips|Coke------------------2
Chips--------------------------1
Coke|Sweets----------------1
Interpretation of above results:
Most often, 3 times, a customer only bought coke(s)
Next up, twice a customer bought chips and a coke
Once a customer only bought chips
And once a customer bought a coke & sweets