So I'm currently running this query. However, when run outside phpmyadmin it causes a 504 timeout error. I'm thinking it has to do with how efficient the number of rows is returned or accessed by the query. I'm not extremely experienced with mySQL and so this was the best I could do:
SELECT
s.surveyId,
q.cat,
SUM((sac.answer_id*q.weight))/SUM(q.weight) AS score,
user.division_id,
user.unit_id,
user.department_id,
user.team_id,
division.division_name,
unit.unit_name,
dpt.department_name,
team.team_name
FROM survey_answers_cache sac
JOIN surveys s ON s.surveyId = sac.surveyid
JOIN subcluster sc ON s.subcluster_id = sc.subcluster_id
JOIN cluster c ON sc.cluster_id = c.cluster_id
JOIN user ON user.user_id = sac.user_id
JOIN questions q ON q.question_id = sac.question_id
JOIN division ON division.division_id = user.division_id
LEFT JOIN unit ON unit.unit_id = user.unit_id
LEFT JOIN department dpt ON dpt.department_id = user.department_id
LEFT JOIN team ON team.team_id = user.team_id
WHERE c.cluster_id=? AND sc.subcluster_id=? AND s.active=0 AND s.prepare=0
GROUP BY user.team_id, s.surveyId, q.cat
ORDER BY s.surveyId, user.team_id, q.cat ASC
The problem I get with this query is that when I get a correct result returned it runs quickly (let's say +-500ms) but when the result has twice as much rows, it takes more than 5 minutes and then causes a 504 timeout. The other problem is that I didn't create this database myself, so I didn't set the indices myself. I'm thinking of improving these and therefore I used the explain command:
I see a lot of primary keys and a couple double indices, but I'm not sure if this would affect the performance this greatly.
EDIT: This piece of code takes up all the execution time:
$start_time = microtime(true);
$stmt = $conn->query($query); //query is simply the query above.
while ($row = $stmt->fetch_assoc()){
$resultSurveys["scores"][] = $row;
}
$stmt->close();
$end_time = microtime(true);
$duration = $end_time - $start_time; //value typically the execution time #reallyHigh...
So my question: Is it possible to (greatly?) improve the performance of the query by altering the database keys or should I divide my query into multiple smaller queries?
Any help is very much appreciated!
P.S. If you think my question is missing something, please don't just downvote but write a comment below or suggest an edit and I will try my best to add the information needed!
The EXPLAIN result is showing signs of problem
Using temporary;using filesort: the ORDER BY needs to create temporary tables to do the sorting.
On 3rd row for user table
type
is ALL,key
andref
are NULL: means that it needs to scan the whole table each time to retrieve results.Suggestions:
changein
database (cross database query).PS: query optimization is an art that requires patience and hard work. No silver bullet for that. Welcome to the fine art of optimizing MySQL!
You can try something like this ( although its not practical for me to test this )
So I hope I didn't mess anything up.
Anyway, the idea is in the inner query you select only the rows you need based on your where condition. This will create a smaller tmp table as it only pulls 2 fields both ints.
Then in the outer query you join to the tables that you actually pull the rest of the data from, order and group. This way you are sorting and grouping on a smaller dataset. And your where clause can run in the most optimal way.
You may even be able to omit some of these tables as your only pulling data from a few of them, but without seeing the full schema and how it's related that's hard to say.
But just generally speaking this part (The sub-query)
Is what is directly affected by your WHERE clause. See so we can optimize this part then use it to join the rest of the data you need.
An example of removing tables can be easily deduced from the above, consider this
The
c
tablecluster
is never used to pull data from, only for the where. So is notThe same as or equivalent to
And therefore we can eliminate that join completely.
i think the problem happends when you add this:
the extra condition sac.user_id = user.user_id can be easily not consistent.
Can you try do a second join with user table?
pd. can you add a "SHOW CREATE TABLE"