SQL - How to find optimal performance numbers for

2019-08-19 22:50发布

问题:

First time here so forgive me for any faux pas. I have a question about the limitation of SQL as I am new to the code, and what I need I believe to be rather complex.

Is it possible to automate finding the optimal data for a specific query. For example, say I have the following columns:

1) Vehicle type (Text) e.g. car,bike,bus

2) Number of passengers (Numeric) e.g. 0-7

3) Was in an accident (Boolean) e.g. t or f

From here, I would like to get percentages. So if I were to select only cars with 3 passengers, what percentage of the total accidents does that account for.

I understand how to get this as a one off or mathematically calculate it, however my question relates how to automate this process to get the optimum number.

So, keeping with this example, say I look at just cars, what number of passengers covers the highest percentage of accidents?

At the moment, I am currently going through and testing number by number, is there a way to 'find' the optimal number? It is easy when it is just 0-7 like in the example, but I would naturally like to deal with a larger range and even multiple ranges. For example, say we add another variable titled:

4) Number of doors (numeric) e-g- 0-3

Would there be a way of finding the best combination of numbers from these two variables that cover the highest percentage of accidents?

So say we took: Car, >2 passengers, <3 doors on the vehicle. Out of the accidents variable 50% were true

But if we change that to:Car, >4 passengers, <3 doors. Out of the accidents variable 80% were true.

I hope I have explained this well. I understand that this is most likely not possible with SQL, however is there another way to find these optimum numbers?

Thanks in advance

回答1:

Here's an example that will give you an answer for all possibilities. You could add a limit clause to show only the top answer, or add to the where clause to limit to specific terms.

SELECT
    `vehicle_type`,
    `num_passengers`,
    sum(if(`in_accident`,1,0)) as `num_accidents`,
    count(*) as `num_in_group`,
    sum(if(`in_accident`,1,0)) / count(*) as `percent_accidents`
FROM `accidents`
GROUP BY `vehicle_type`,
    `num_passengers`
ORDER BY sum(if(`in_accident`,1,0)) / count(*)