Need help figuring out how to do a cross-tabulated report within one query. There are 3-4 tables involved but the users table may not need to be included in the query since we just need a count.
I have put together a screenshot of the table schema and data as an example which can be seen below:
What I need it to return is a query result that looks like:
So I can make a report that looks like:
I've tried to do cursor loops as it's the only way I can do it with my basic knowledge, but it's way too slow.
One particular report I'm trying to generate contains 32 rows and 64 columns with about 70,000 answers, so it's all about the performance of getting it down to one query and fast as possible.
I understand this may depend on indexes and so on but if someone could help me figure out how I could get this done in 1 query (with multiple joins?), that would be awesome!
Thanks!
SELECT MIN(ro.OptionText) RowOptionText, MIN(co.OptionText) RowOptionText, COUNT(ca.AnswerID) AnswerCount
FROM tblQuestions rq
CROSS JOIN tblQuestions cq
JOIN tblOptions ro ON rq.QuestionID = ro.QuestionID
JOIN tblOptions co ON cq.QuestionID = co.QuestionID
LEFT JOIN tblAnswers ra ON ra.OptionID = ro.OptionID
LEFT JOIN tblAnswers ca ON ca.OptionID = co.OptionID AND ca.UserID = ra.UserID
WHERE rq.questionText = 'Gender'
AND cq.questionText = 'How happy are you?'
GROUP BY ro.OptionID, co.OptionID
ORDER BY ro.OptionID, co.OptionID
This should be at least close to what you asked for. Turning this into a pivot will require dynamic SQL as SQL Server requires you to specify the actual value that will be pivoted into a column.
We cross join the questions and limit the results from each of those question references to the single question for the row values and column values respectively. Then we join the option values to the respective question reference. We use LEFT JOIN for the answers in case the user didn't respond to all of the questions. And we join the answers by UserID so that we match the row question and column question for each user. The MIN on the option text is because we grouped and ordered by OptionID to match your sequencing shown.
EDIT: Here's a SQLFiddle
For what it's worth, your query is complicated because you are using the Entity-Attribute-Value design pattern. Quite a few SQL Server experts consider that pattern to be problematic and to be avoided if possible. For instance see https://www.simple-talk.com/sql/t-sql-programming/avoiding-the-eav-of-destruction/.
EDIT 2: Since you accepted my answer, here's the dynamic SQL pivot solution :) SQLFiddle
DECLARE @SqlCmd NVARCHAR(MAX)
SELECT @SqlCmd = N'SELECT RowOptionText, ' + STUFF(
(SELECT ', ' + QUOTENAME(o.OptionID) + ' AS ' + QUOTENAME(o.OptionText)
FROM tblOptions o
WHERE o.QuestionID = cq.QuestionID
FOR XML PATH ('')), 1, 2, '') + ', RowTotal AS [Row Total]
FROM (
SELECT ro.OptionID RowOptionID, ro.OptionText RowOptionText, co.OptionID ColOptionID,
ca.UserID, COUNT(ca.UserID) OVER (PARTITION BY ra.OptionID) AS RowTotal
FROM tblOptions ro
JOIN tblOptions co ON ro.QuestionID = ' + CAST(rq.QuestionID AS VARCHAR(10)) +
' AND co.QuestionID = ' + CAST(cq.QuestionID AS VARCHAR(10)) + '
LEFT JOIN tblAnswers ra ON ra.OptionID = ro.OptionID
LEFT JOIN tblAnswers ca ON ca.OptionID = co.OptionID AND ca.UserID = ra.UserID
UNION ALL
SELECT 999999, ''Column Total'' RowOptionText, co.OptionID ColOptionID,
ca.UserID, COUNT(ca.UserID) OVER () AS RowTotal
FROM tblOptions ro
JOIN tblOptions co ON ro.QuestionID = ' + CAST(rq.QuestionID AS VARCHAR(10)) +
' AND co.QuestionID = ' + CAST(cq.QuestionID AS VARCHAR(10)) + '
LEFT JOIN tblAnswers ra ON ra.OptionID = ro.OptionID
LEFT JOIN tblAnswers ca ON ca.OptionID = co.OptionID AND ca.UserID = ra.UserID
) t
PIVOT (COUNT(UserID) FOR ColOptionID IN (' + STUFF(
(SELECT ', ' + QUOTENAME(o.OptionID)
FROM tblOptions o
WHERE o.QuestionID = cq.QuestionID
FOR XML PATH ('')), 1, 2, '') + ')) p
ORDER BY RowOptionID'
FROM tblQuestions rq
CROSS JOIN tblQuestions cq
WHERE rq.questionText = 'Gender'
AND cq.questionText = 'How happy are you?'
EXEC sp_executesql @SqlCmd
I think I see the problem. I know you can't modify the schema, but you need a conceptual table for the crosstab information such as which questionID is the rowHeader and which is the colHeader. You can create it in an external data source and join with the existing source or simply hard-code the table values in your sql.
you need to have 2 instances of the question/option/answer relations, one for each rowHeader and colHeader for each crosstab. Those 2 relations are joined by the userID.
this version has your outer joins:
sqlFiddle
this version doesn't have the crossTab table, just the row and col questionIDs hard-coded:
sqlFiddleNoTbl
The following piece of mess works with no hard-coded values but fails to show the rows where the count is 0.
This might however still work for your report.
;with stepone as(
SELECT
RANK() OVER(PARTITION BY a.UserId ORDER BY o.QuestionID) AS [temprank]
, o.QuestionID AS [QID1]
, o.OptionID AS [OID1]
, same.QuestionID
, same.OptionID
, a.UserId AS [IDUser]
, same.UserId
FROM
tblAnswers a
INNER JOIN
tblOptions o
ON a.OptionID = o.OptionID
INNER JOIN
tblQuestions q
ON o.QuestionID = q.QuestionID
INNER JOIN
(
SELECT
a.AnswerID
, a.OptionID
, a.UserId
, o.QuestionID
FROM
tblAnswers a
INNER JOIN
tblOptions o
ON a.OptionID = o.OptionID
) same
ON a.UserId = same.UserId AND a.AnswerID <> same.AnswerID
)
, stepthree AS(
SELECT
t.QID1, t.OID1, t.QuestionID, t.OptionID
, COUNT(UserId) AS myCount
FROM
stepone t
WHERE t.temprank = 1
GROUP BY
t.QID1, t.OID1, t.QuestionID, t.OptionID
)
SELECT
o1.OptionText AS [RowTest]
, o2.OptionText AS [ColumnText]
, t.myCount AS [Count]
FROM
stepthree t
INNER JOIN tblOptions o1
ON t.OID1 = o1.OptionID
INNER JOIN tblOptions o2
ON t.OptionID = o2.OptionID
ORDER BY t.OID1
Hope it helps, I enjoyed trying to do it.