Getting data for histogram plot

2019-01-07 02:36发布

Is there a way to specify bin sizes in MySQL? Right now, I am trying the following SQL query:

select total, count(total) from faults GROUP BY total;

The data that is being generated is good enough but there are just too many rows. What I need is a way to group the data into predefined bins. I can do this from a scripting language, but is there a way to do it directly in SQL?

Example:

+-------+--------------+
| total | count(total) |
+-------+--------------+
|    30 |            1 | 
|    31 |            2 | 
|    33 |            1 | 
|    34 |            3 | 
|    35 |            2 | 
|    36 |            6 | 
|    37 |            3 | 
|    38 |            2 | 
|    41 |            1 | 
|    42 |            5 | 
|    43 |            1 | 
|    44 |            7 | 
|    45 |            4 | 
|    46 |            3 | 
|    47 |            2 | 
|    49 |            3 | 
|    50 |            2 | 
|    51 |            3 | 
|    52 |            4 | 
|    53 |            2 | 
|    54 |            1 | 
|    55 |            3 | 
|    56 |            4 | 
|    57 |            4 | 
|    58 |            2 | 
|    59 |            2 | 
|    60 |            4 | 
|    61 |            1 | 
|    63 |            2 | 
|    64 |            5 | 
|    65 |            2 | 
|    66 |            3 | 
|    67 |            5 | 
|    68 |            5 | 
------------------------

What I am looking for:

+------------+---------------+
| total      | count(total)  |
+------------+---------------+
|    30 - 40 |            23 | 
|    40 - 50 |            15 | 
|    50 - 60 |            51 | 
|    60 - 70 |            45 | 
------------------------------

I guess this cannot be achieved in a straight forward manner but a reference to any related stored procedure would be fine as well.

10条回答
做自己的国王
2楼-- · 2019-01-07 02:52

That should work. Not so elegant but still:

select count(mycol - (mycol mod 10)) as freq, mycol - (mycol mod 10) as label
from mytable
group by mycol - (mycol mod 10)
order by mycol - (mycol mod 10) ASC

via Mike DelGaudio

查看更多
仙女界的扛把子
3楼-- · 2019-01-07 02:54

Equal width binning into a given count of bins:

WITH bins AS(
   SELECT min(col) AS min_value
        , ((max(col)-min(col)) / 10.0) + 0.0000001 AS bin_width
   FROM cars
)
SELECT tab.*,
   floor((col-bins.min_value) / bins.bin_width ) AS bin
FROM tab, bins;

Note that the 0.0000001 is there to make sure that the records with the value equal to max(col) do not make it's own bin just by itself. Also, the additive constant is there to make sure the query does not fail on division by zero when all the values in the column are identical.

Also note that the count of bins (10 in the example) should be written with a decimal mark to avoid integer division (the unadjusted bin_width can be decimal).

查看更多
太酷不给撩
4楼-- · 2019-01-07 02:56
SELECT b.*,count(*) as total FROM bins b 
left outer join table1 a on a.value between b.min_value and b.max_value 
group by b.min_value

The table bins contains columns min_value and max_value which define the bins. note that the operator "join... on x BETWEEN y and z" is inclusive.

table1 is the name of the data table

查看更多
我只想做你的唯一
5楼-- · 2019-01-07 03:00
select case when total >= 30 and total <= 40 THEN "30-40"       
       else when total >= 40 and total <= 50 then "40-50" 
       else "50-60" END as Total , count(total) 
group by Total 
查看更多
Explosion°爆炸
6楼-- · 2019-01-07 03:02

In addition to great answer https://stackoverflow.com/a/10363145/916682, you can use phpmyadmin chart tool for a nice result:

enter image description here

enter image description here

查看更多
爷的心禁止访问
7楼-- · 2019-01-07 03:06

I made a procedure that can be used to automatically generate a temporary table for bins according to a specified number or size, for later use with Ofri Raviv's solution.

CREATE PROCEDURE makebins(numbins INT, binsize FLOAT) # binsize may be NULL for auto-size
BEGIN
 SELECT FLOOR(MIN(colval)) INTO @binmin FROM yourtable;
 SELECT CEIL(MAX(colval)) INTO @binmax FROM yourtable;
 IF binsize IS NULL 
  THEN SET binsize = CEIL((@binmax-@binmin)/numbins); # CEIL here may prevent the potential creation a very small extra bin due to rounding errors, but no good where floats are needed.
 END IF;
 SET @currlim = @binmin;
 WHILE @currlim + binsize < @binmax DO
  INSERT INTO bins VALUES (@currlim, @currlim+binsize);
  SET @currlim = @currlim + binsize;
 END WHILE;
 INSERT INTO bins VALUES (@currlim, @maxbin);
END;

DROP TABLE IF EXISTS bins; # be careful if you have a bins table of your own.
CREATE TEMPORARY TABLE bins (
minval INT, maxval INT, # or FLOAT, if needed
KEY (minval), KEY (maxval) );# keys could perhaps help if using a lot of bins; normally negligible

CALL makebins(20, NULL);  # Using 20 bins of automatic size here. 

SELECT bins.*, count(*) AS total FROM bins
LEFT JOIN yourtable ON yourtable.value BETWEEN bins.minval AND bins.maxval
GROUP BY bins.minval

This will generate the histogram count only for the bins that are populated. David West ought to be right in his correction, but for some reason, unpopulated bins do not appear in the result for me (despite the use of a LEFT JOIN — I do not understand why).

查看更多
登录 后发表回答