Is there a way to specify bin sizes in MySQL? Right now, I am trying the following SQL query:
select total, count(total) from faults GROUP BY total;
The data that is being generated is good enough but there are just too many rows. What I need is a way to group the data into predefined bins. I can do this from a scripting language, but is there a way to do it directly in SQL?
Example:
+-------+--------------+
| total | count(total) |
+-------+--------------+
| 30 | 1 |
| 31 | 2 |
| 33 | 1 |
| 34 | 3 |
| 35 | 2 |
| 36 | 6 |
| 37 | 3 |
| 38 | 2 |
| 41 | 1 |
| 42 | 5 |
| 43 | 1 |
| 44 | 7 |
| 45 | 4 |
| 46 | 3 |
| 47 | 2 |
| 49 | 3 |
| 50 | 2 |
| 51 | 3 |
| 52 | 4 |
| 53 | 2 |
| 54 | 1 |
| 55 | 3 |
| 56 | 4 |
| 57 | 4 |
| 58 | 2 |
| 59 | 2 |
| 60 | 4 |
| 61 | 1 |
| 63 | 2 |
| 64 | 5 |
| 65 | 2 |
| 66 | 3 |
| 67 | 5 |
| 68 | 5 |
------------------------
What I am looking for:
+------------+---------------+
| total | count(total) |
+------------+---------------+
| 30 - 40 | 23 |
| 40 - 50 | 15 |
| 50 - 60 | 51 |
| 60 - 70 | 45 |
------------------------------
I guess this cannot be achieved in a straight forward manner but a reference to any related stored procedure would be fine as well.
That should work. Not so elegant but still:
via Mike DelGaudio
Equal width binning into a given count of bins:
Note that the 0.0000001 is there to make sure that the records with the value equal to max(col) do not make it's own bin just by itself. Also, the additive constant is there to make sure the query does not fail on division by zero when all the values in the column are identical.
Also note that the count of bins (10 in the example) should be written with a decimal mark to avoid integer division (the unadjusted bin_width can be decimal).
The table bins contains columns min_value and max_value which define the bins. note that the operator "join... on x BETWEEN y and z" is inclusive.
table1 is the name of the data table
In addition to great answer https://stackoverflow.com/a/10363145/916682, you can use phpmyadmin chart tool for a nice result:
I made a procedure that can be used to automatically generate a temporary table for bins according to a specified number or size, for later use with Ofri Raviv's solution.
This will generate the histogram count only for the bins that are populated. David West ought to be right in his correction, but for some reason, unpopulated bins do not appear in the result for me (despite the use of a LEFT JOIN — I do not understand why).