SQL Cross Tab Query

2019-05-20 07:18发布

问题:

Need help figuring out how to do a cross-tabulated report within one query. There are 3-4 tables involved but the users table may not need to be included in the query since we just need a count.

I have put together a screenshot of the table schema and data as an example which can be seen below:

What I need it to return is a query result that looks like:

So I can make a report that looks like:

I've tried to do cursor loops as it's the only way I can do it with my basic knowledge, but it's way too slow.

One particular report I'm trying to generate contains 32 rows and 64 columns with about 70,000 answers, so it's all about the performance of getting it down to one query and fast as possible.

I understand this may depend on indexes and so on but if someone could help me figure out how I could get this done in 1 query (with multiple joins?), that would be awesome!

Thanks!

回答1:

SELECT MIN(ro.OptionText) RowOptionText, MIN(co.OptionText) RowOptionText, COUNT(ca.AnswerID) AnswerCount
FROM tblQuestions rq 
CROSS JOIN tblQuestions cq 
JOIN tblOptions ro ON rq.QuestionID = ro.QuestionID
JOIN tblOptions co ON cq.QuestionID = co.QuestionID
LEFT JOIN tblAnswers ra ON ra.OptionID = ro.OptionID
LEFT JOIN tblAnswers ca ON ca.OptionID = co.OptionID AND ca.UserID = ra.UserID
WHERE rq.questionText = 'Gender'
AND cq.questionText = 'How happy are you?'
GROUP BY ro.OptionID, co.OptionID
ORDER BY ro.OptionID, co.OptionID

This should be at least close to what you asked for. Turning this into a pivot will require dynamic SQL as SQL Server requires you to specify the actual value that will be pivoted into a column.

We cross join the questions and limit the results from each of those question references to the single question for the row values and column values respectively. Then we join the option values to the respective question reference. We use LEFT JOIN for the answers in case the user didn't respond to all of the questions. And we join the answers by UserID so that we match the row question and column question for each user. The MIN on the option text is because we grouped and ordered by OptionID to match your sequencing shown.

EDIT: Here's a SQLFiddle

For what it's worth, your query is complicated because you are using the Entity-Attribute-Value design pattern. Quite a few SQL Server experts consider that pattern to be problematic and to be avoided if possible. For instance see https://www.simple-talk.com/sql/t-sql-programming/avoiding-the-eav-of-destruction/.

EDIT 2: Since you accepted my answer, here's the dynamic SQL pivot solution :) SQLFiddle

DECLARE @SqlCmd NVARCHAR(MAX)

SELECT @SqlCmd = N'SELECT RowOptionText, ' + STUFF(
    (SELECT ', ' + QUOTENAME(o.OptionID) + ' AS ' + QUOTENAME(o.OptionText)
    FROM tblOptions o 
    WHERE o.QuestionID = cq.QuestionID
    FOR XML PATH ('')), 1, 2, '') + ', RowTotal AS [Row Total]
FROM (
    SELECT ro.OptionID RowOptionID, ro.OptionText RowOptionText, co.OptionID ColOptionID,
       ca.UserID, COUNT(ca.UserID) OVER (PARTITION BY ra.OptionID) AS RowTotal
    FROM tblOptions ro
    JOIN tblOptions co ON ro.QuestionID = ' + CAST(rq.QuestionID AS VARCHAR(10)) + 
    ' AND co.QuestionID = ' + CAST(cq.QuestionID AS VARCHAR(10)) + '
    LEFT JOIN tblAnswers ra ON ra.OptionID = ro.OptionID
    LEFT JOIN tblAnswers ca ON ca.OptionID = co.OptionID AND ca.UserID = ra.UserID
    UNION ALL 
    SELECT 999999, ''Column Total'' RowOptionText, co.OptionID ColOptionID,
       ca.UserID, COUNT(ca.UserID) OVER () AS RowTotal
    FROM tblOptions ro
    JOIN tblOptions co ON ro.QuestionID = ' + CAST(rq.QuestionID AS VARCHAR(10)) + 
    ' AND co.QuestionID = ' + CAST(cq.QuestionID AS VARCHAR(10)) + '
    LEFT JOIN tblAnswers ra ON ra.OptionID = ro.OptionID
    LEFT JOIN tblAnswers ca ON ca.OptionID = co.OptionID AND ca.UserID = ra.UserID
) t
PIVOT (COUNT(UserID) FOR ColOptionID IN (' + STUFF(
    (SELECT ', ' + QUOTENAME(o.OptionID) 
    FROM tblOptions o 
    WHERE o.QuestionID = cq.QuestionID
    FOR XML PATH ('')), 1, 2, '') + ')) p
ORDER BY RowOptionID'
FROM tblQuestions rq 
CROSS JOIN tblQuestions cq 
WHERE rq.questionText = 'Gender' 
AND cq.questionText = 'How happy are you?'

EXEC sp_executesql @SqlCmd


回答2:

I think I see the problem. I know you can't modify the schema, but you need a conceptual table for the crosstab information such as which questionID is the rowHeader and which is the colHeader. You can create it in an external data source and join with the existing source or simply hard-code the table values in your sql.

you need to have 2 instances of the question/option/answer relations, one for each rowHeader and colHeader for each crosstab. Those 2 relations are joined by the userID.

this version has your outer joins: sqlFiddle

this version doesn't have the crossTab table, just the row and col questionIDs hard-coded: sqlFiddleNoTbl



回答3:

The following piece of mess works with no hard-coded values but fails to show the rows where the count is 0. This might however still work for your report.

;with stepone as(

SELECT
    RANK() OVER(PARTITION BY a.UserId ORDER BY o.QuestionID) AS [temprank]
,   o.QuestionID AS [QID1]
,   o.OptionID AS [OID1]
,   same.QuestionID
,   same.OptionID
,   a.UserId AS [IDUser]
,   same.UserId
FROM
    tblAnswers a
    INNER JOIN
    tblOptions o
        ON a.OptionID = o.OptionID
    INNER JOIN
    tblQuestions q
        ON o.QuestionID = q.QuestionID
    INNER JOIN
    (
    SELECT
        a.AnswerID
    ,   a.OptionID
    ,   a.UserId
    ,   o.QuestionID    
    FROM
        tblAnswers a
        INNER JOIN
        tblOptions o
            ON a.OptionID = o.OptionID
    ) same
        ON a.UserId = same.UserId AND a.AnswerID <> same.AnswerID

)

, stepthree AS(
SELECT
    t.QID1, t.OID1, t.QuestionID, t.OptionID
,   COUNT(UserId) AS myCount
FROM 
    stepone t
WHERE t.temprank = 1
GROUP BY
    t.QID1, t.OID1, t.QuestionID, t.OptionID
)

SELECT
    o1.OptionText AS [RowTest]
,   o2.OptionText AS [ColumnText]
,   t.myCount AS [Count]

FROM
    stepthree t
    INNER JOIN tblOptions o1
        ON t.OID1 = o1.OptionID
    INNER JOIN tblOptions o2
        ON t.OptionID = o2.OptionID
ORDER BY t.OID1

Hope it helps, I enjoyed trying to do it.