What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x)
for finding the mean, but I'm having a hard time finding a simple way of calculating the median. For now, I'm returning all the rows to PHP, doing a sort, and then picking the middle row, but surely there must be some simple way of doing it in a single MySQL query.
Example data:
id | val
--------
1 4
2 7
3 2
4 2
5 9
6 8
7 3
Sorting on val
gives 2 2 3 4 7 8 9
, so the median should be 4
, versus SELECT AVG(val)
which == 5
.
Another riff on Velcrow's answer, but uses a single intermediate table and takes advantage of the variable used for row numbering to get the count, rather than performing an extra query to calculate it. Also starts the count so that the first row is row 0 to allow simply using Floor and Ceil to select the median row(s).
I have a database containing about 1 billion rows that we require to determine the median age in the set. Sorting a billion rows is hard, but if you aggregate the distinct values that can be found (ages range from 0 to 100), you can sort THIS list, and use some arithmetic magic to find any percentile you want as follows:
This query depends on your db supporting window functions (including ROWS UNBOUNDED PRECEDING) but if you do not have that it is a simple matter to join aggData CTE with itself and aggregate all prior totals into the 'accumulated' column which is used to determine which value contains the specified precentile. The above sample calcuates p10, p25, p50 (median), p75, and p90.
-Chris
I used a two query approach:
These are wrapped in a function defn, so all values can be returned from one call.
If your ranges are static and your data does not change often, it might be more efficient to precompute/store these values and use the stored values instead of querying from scratch every time.
Medians grouped by dimension:
Knowing exact row count you can use this query:
Where
<half> = ceiling(<size> / 2.0) - 1
After reading all previous ones they didn't match with my actual requirement so I implemented my own one which doesn't need any procedure or complicate statements, just I
GROUP_CONCAT
all values from the column I wanted to obtain the MEDIAN and applying a COUNT DIV BY 2 I extract the value in from the middle of the list like the following query does :(POS is the name of the column I want to get its median)
I hope this could be useful for someone in the way many of other comments were for me from this website.