R generate 2D histogram from raw data

2020-05-19 01:58发布

问题:

I have some raw data in 2D, x, y as given below. I want to generate a 2D histogram from the data. Typically, dividing the x,y values into bins of size 0.5, and count the number of occurrences in each bin (for both x and y at the same time). Is there any way to do that?

> df
            x         y
1   4.2179611 5.7588577
2   5.3901279 5.8219784
3   4.1933089 6.4317645
4   5.8076411 5.8999598
5   5.5781166 5.9382342
6   4.5569735 6.7833469
7   4.4024492 5.8019719
8   4.1734975 6.0896355
9   5.1707871 5.5640962
10  5.6380258 6.9112775
11  4.6405353 5.2251746
12  4.1809004 6.1127144
13  4.2764079 5.4598799
14  5.4466446 6.0130047
15  5.2443804 5.5421851
16  5.7521515 5.4115965
17  4.9667564 5.3519795
18  4.5007141 6.8669231
19  5.0268273 5.7681888
20  4.4738948 6.4241168
21  4.4116357 5.9819519
22  4.5741988 6.4595129
23  4.0839075 6.8105259
24  4.7154364 6.5054761
25  4.8986785 5.5511226
26  5.6262397 6.8996480
27  4.9034275 5.6716375
28  4.1872928 5.8387641
29  4.0444855 5.2554446
30  4.8911393 5.8449165
31  5.7268887 6.7100432
32  5.9136374 6.5059128
33  4.9481286 6.4679917
34  4.6198987 5.7462047
35  5.7306916 6.0613158
36  5.5818586 6.4533566
37  5.9240267 6.7748290
38  4.8160926 6.4942865
39  5.5456258 5.7911897
40  4.3075173 6.8165520
41  4.9654533 5.8904734
42  5.9581820 5.7692468
43  4.2417172 5.7990554
44  5.3670112 5.8252479
45  5.2932098 5.3983672
46  5.7456521 6.2563828
47  4.9398795 5.2879065
48  4.8526884 6.9827555
49  5.6135753 6.5219431
50  4.0727956 5.2647714
51  6.9418969 5.2584325
52  5.4189039 5.9936456
53  3.9193741 6.7099562
54  5.5885252 5.9680734
55  5.9581279 5.1843804
56  4.5724421 6.6774004
57  4.7700303 6.6083613
58  5.5490254 6.2431170
59  4.1668548 5.1017475
60  5.8948947 6.7646917
61  6.5501872 5.2803433
62  5.6011444 4.2733087
63  5.1337226 6.5225780
64  5.3153358 6.6164809
65  3.3815056 6.4077659
66  3.8405670 5.3677008
67  6.7036350 4.3090214
68  3.2446588 4.0965275
69  4.6563593 7.6868628
70  5.2382914 7.0020874
71  6.0771605 6.6232541
72  3.5672511 6.9333691
73  5.0865233 4.0778233
74  5.6743559 5.5177734
75  4.5759146 7.2210012
76  5.8203140 4.9787148
77  3.1106176 6.3937707
78  4.6310679 4.4731806
79  6.8237641 6.2679791
80  3.7653803 5.9188107
81  5.6139040 5.8586176
82  6.2016662 5.3514293
83  3.9362048 5.3217560
84  6.8005236 7.9247371
85  5.8030101 7.7492432
86  6.0143418 6.0709249
87  6.5734089 7.6112815
88  4.0569383 5.8440535
89  4.6825752 7.7926235
90  4.8204027 6.3106798
91  3.5001675 6.3156079
92  3.6521280 7.5155810
93  5.0945236 4.8206873
94  3.8732946 5.6771599
95  6.4812309 5.6082170
96  5.0308355 7.6877289
97  5.2193389 7.7133717
98  6.2239631 5.5387684
99  4.6501488 7.8559335
100 3.5389389 5.4594034
101 5.7139486 4.5008182
102 3.5425132 7.3562487
103 6.9950663 6.1036549
104 5.3801845 5.8903123
105 4.7629191 5.3394552
106 4.4102815 7.2312852
107 5.8723641 4.1410996
108 3.4691208 4.6383708
109 4.6479362 5.8562699
110 3.0315732 6.8614265
111 5.9456145 4.7497545
112 4.8461189 4.4730002
113 4.9606723 5.1099093
114 4.7802659 7.8147864
115 5.0189229 6.9308301
116 6.4738074 5.0539666
117 5.3725075 5.3282273
118 6.5374505 7.0508875
119 4.0907139 5.0855075
120 5.0557532 5.6449829
121 6.5483249 7.5800015
122 3.1083616 7.3697234
123 3.6119548 7.7639486
124 6.5157691 7.7152933
125 4.0305622 7.0521419
126 3.2197769 6.5881246
127 4.7570419 6.4564400
128 4.0063007 6.3981942
129 4.4412649 7.6576221
130 5.7348769 6.7601804
131 3.1312551 5.6295996
132 3.8627964 7.5817083
133 5.2008281 5.1082509
134 6.4229161 6.2816475
135 2.5241894 6.0802138
136 7.3759753 5.1090478
137 3.7284166 5.2045976
138 3.4404286 6.9708127
139 6.4237399 5.1363851
140 4.1829368 5.1612791
141 5.9500285 5.4765621
142 3.3555182 6.2627360
143 7.7691356 5.1877095
144 4.0684189 7.1663495
145 7.3929140 7.3819058
146 2.1659981 7.9796005
147 4.8539955 7.3108966
148 5.3932658 4.7116979
149 3.5610560 4.6096759
150 5.1883331 6.8068501
151 6.4233558 7.2955388
152 7.3308739 6.1761356
153 3.0710449 4.5296235
154 7.5400128 5.1559900
155 3.5776389 5.2057676
156 4.0402288 7.1487121
157 2.3107258 6.9816127
158 7.2065591 7.7307439
159 5.7577620 5.6652052
160 2.0595554 7.4373547
161 7.5994468 4.6216856
162 4.8053745 3.9113634
163 7.5769460 7.6019067
164 5.5362034 8.9270974
165 3.6713241 3.9060205
166 6.0612046 7.3862080
167 6.9205755 7.0792392
168 6.0892821 6.3248315
169 2.0532905 4.1545875
170 3.4086310 3.5510909
171 5.2148895 5.3266145
172 4.7638780 7.9240988
173 6.4717329 5.1350172
174 7.8287022 4.3457324
175 6.0299681 3.0952274
176 3.2760103 5.2730464
177 2.5729991 7.6594251
178 3.9403251 7.8928014
179 6.0021556 7.5313493
180 7.8561727 4.5092728
181 3.5818174 4.1140876
182 7.4972295 5.5313987
183 6.0138287 6.9369784
184 3.9257191 7.6395296
185 3.0462106 3.1347680
186 6.0630447 4.1847229
187 7.4878528 5.1004141
188 4.5145570 4.6389011
189 6.2777996 4.2647980
190 3.0166336 7.5755042
191 2.8791041 6.4471746
192 7.1029767 7.0061048
193 2.4526181 6.3373793
194 5.8762775 7.0746223
195 7.0609100 8.1256569
196 4.7252400 8.4829780
197 3.3695501 8.8786640
198 3.8505741 6.8260398
199 5.3573846 6.3864944
200 3.7039072 8.9951078
201 4.6216933 6.7890198
202 7.0390643 5.9458624
203 5.7172605 6.9083246
204 2.3814644 8.3856125
205 2.4432566 3.2618192
206 4.3881965 6.7022219
207 5.2583749 7.2432485
208 5.8540367 8.5154705
209 6.4267791 4.9593757
210 5.0668461 3.1358129
211 2.6845736 8.9880143
212 7.3094761 5.4049133
213 4.2176252 5.5062193
214 5.2025716 4.0798478
215 6.5592571 8.1852765
216 2.0417939 7.0843906
217 7.6045374 7.4870940
218 6.5971789 8.8641329
219 5.3541694 7.2176914
220 2.8314803 6.4831720
221 2.4252467 4.0918736
222 6.6804732 6.3624739
223 6.0325285 6.2057468
224 2.2751047 5.1275412
225 5.5397481 5.9890834
226 4.6420585 4.6013327
227 7.6385642 5.1722194
228 6.7378078 5.8246169
229 5.0647686 7.9219705
230 2.8672731 6.6371082
231 7.5487359 4.5727898
232 1.0837662 7.1788146
233 5.4483746 6.8955122
234 9.3085746 4.8330044
235 3.8484225 6.0133789
236 2.8034987 3.0023096
237 2.8952626 8.2623788
238 5.7666136 3.2158710
239 6.4978214 5.7866574
240 1.5184268 5.9791716
241 2.3836147 8.2897188
242 4.7318649 6.1174515
243 5.8544588 7.5056688
244 9.6776416 6.5151695
245 0.4319531 4.2470331
246 0.9810053 8.6452087
247 7.0819634 3.2488110
248 1.9084265 6.1122130
249 7.5096342 3.3495096
250 8.9564496 3.4960564
251 5.7603943 6.9091760
252 0.8801204 7.2744429
253 1.2183581 6.4264214
254 1.7761613 7.1199729
255 3.2490662 7.9935963
256 3.5420375 8.4801333
257 8.7709382 3.8011487
258 8.4770868 3.4749692
259 0.9965042 6.7509705
260 7.5049457 5.4313474
261 9.7261151 6.5909553
262 5.3893371 4.0194548
263 9.6154510 7.3117416
264 1.0327841 6.2376586
265 4.0064715 3.7333634
266 6.6941050 3.9452152
267 4.1317951 9.3322756
268 9.6481471 7.5330023
269 7.3474233 1.0310166
270 3.7343864 4.9808341
271 9.1412231 2.6655861
272 5.8414100 0.1329439
273 2.4837309 7.4956203
274 2.7983337 1.3563719
275 0.6335727 7.9273816
276 7.5566740 0.4321263
277 8.6182079 0.6038505
278 0.8928523 8.0131172
279 5.7375090 8.5275545
280 0.7864533 3.3954255
281 8.7808839 1.7059789
282 9.6621659 0.9215045
283 8.4894688 8.7667948
284 1.0358920 7.2505891
285 0.7378660 0.1173287
286 9.5485481 3.3186128
287 6.8987508 9.5480887
288 7.4105831 5.8809522
289 6.6984457 5.9509037
290 1.7878216 9.1932955
291 0.8443295 5.1662902
292 0.4498266 8.9636923
293 2.5068754 5.3692908
294 9.2509052 2.4204235
295 4.1333742 6.2581851
296 6.5510938 7.2923688
297 4.3412873 3.5514825
298 4.2349765 9.3207514
299 2.8730785 7.2752405
300 2.0425362 6.6513146
301 6.4498432 7.2949259
302 5.7453188 6.3263712
303 7.0501276 8.2238207
304 4.1915008 1.5325379
305 8.1307954 7.7681944
306 7.3156552 6.3031412
307 4.0302052 0.3039900
308 3.3740358 2.1386235
309 8.2055657 2.9112215
310 1.8817856 7.0503046
311 7.0820523 6.8739097
312 5.0725238 6.9951556
313 1.6246224 5.4126084
314 3.8865553 7.6398192
315 6.6727672 8.9677947
316 9.6048687 7.6757966
317 2.2006018 9.6385351
318 9.6403802 7.6438900
319 0.1267512 0.9048408
320 1.8160829 7.3193066
321 9.9318386 9.6068456
322 2.1275892 7.8034724
323 1.2232242 1.0695030
324 3.0198057 3.8964732
325 3.3265773 8.5865587
326 5.1519605 7.5068253
327 0.4137485 5.9223826
328 1.6896445 0.6071874
329 1.8534083 2.3554291
330 1.7182264 9.3488597
331 6.4165456 9.8670765
332 7.6270001 2.1839607
333 8.9867227 5.9565743
334 6.9185079 0.2440980
335 6.7359209 7.1072908
336 3.8034763 5.8466404
337 3.4583027 6.9041502
338 1.7983897 1.7108336
339 6.9184406 6.3632716
340 1.3538600 6.8484462
341 3.6731748 4.9846946
342 5.6139620 8.0637827
343 9.0991782 2.3051189
344 1.1220448 8.9624365
345 2.5925265 8.3673795
346 9.9977377 8.5423564
347 5.1761187 5.1240824
348 5.9330451 9.4141322
349 6.3337224 6.8055697
350 2.7287418 5.7100024
351 6.1022411 2.9733360
352 2.7331869 3.7135612
353 6.7394034 8.2721572
354 2.1757932 9.0574057
355 5.5011486 6.0124142
356 4.5301911 2.5865048
357 5.3137001 0.7062267
358 0.6959286 3.2395043
359 5.3494169 6.5742589
360 7.1472046 6.3821916
361 0.1749855 0.3954287
362 6.7709760 6.5212015
363 7.2983482 3.0086604
364 0.6147726 9.3336870
365 7.4417342 2.6836695
366 1.2769881 4.0591093
367 9.5342317 5.3443613
368 0.9368862 1.1391497
369 8.4271193 8.6641296
370 6.2000851 8.2987486
371 2.1768279 6.0684896
372 5.2021222 6.9222675
373 0.6095874 8.4759464
374 2.0217473 9.5844241
375 4.8080163 6.5052801
376 3.6099334 0.3272768
377 6.0132712 7.9920535
378 4.0495344 8.8153621
379 6.9646704 7.0375214
380 3.9211171 2.5994333
381 4.4749268 1.0517360
382 1.1683429 3.8710614
383 1.7618115 0.3513996
384 1.1257639 5.7446745
385 3.7351688 8.7376011
386 4.9234662 7.1975462
387 7.4899861 7.3846309
388 7.4170082 2.2885060
389 0.8526702 3.8160722
390 4.5907512 8.9315418
391 7.6996179 9.8409051
392 0.2340987 4.2906009
393 2.2502736 1.7819172
394 3.5679969 1.7419479
395 5.4214908 5.6001803
396 3.9965213 9.2021549
397 3.8610336 2.0462740
398 5.9490575 4.4422382
399 9.8897791 5.6402915
400 6.1153192 4.1236797
401 5.8906384 2.6153750
402 8.0582664 2.7137804
403 7.2969209 2.9362187
404 3.8673527 1.0837191
405 3.5647339 6.2338014
406 9.6490210 0.8373270
407 0.8133243 6.3393130
408 2.8760565 9.9462423
409 3.3836457 7.4451869
410 4.7772609 2.9141127
411 8.6635971 5.7812494
412 5.6192160 1.4764255
413 9.1334625 8.9822399
414 0.4662385 6.6440937
415 3.4503559 4.2064800
416 0.6704780 2.8508758
417 0.5211872 4.3109175
418 7.5615411 9.2851454
419 7.5081906 4.0019450
420 8.8851669 9.7323717
421 7.3856288 8.6152906
422 9.5926351 0.3993818
423 1.4478981 1.4845263
424 5.0425560 1.3501638
425 0.8952120 7.9407680
426 6.4732584 7.1493210
427 9.6595225 5.2377876
428 7.2204625 2.0300222
429 3.5410601 7.3117738
430 6.7991771 3.6368291

Just for clarification, I want to get something like this plot below (this plot doesn't have to do anything with my raw data, I am just showing it to explain the problem more clearly! If I use hist(df$x) it will show the distribution of x only.)

回答1:

The ggplot is elegant and fast and pretty, as usual. But if you want to use base graphics (image, contour, persp) and display your actual frequencies (instead of the smoothing 2D kernel), you have to first obtain the binnings yourself and create a matrix of frequencies. Here's some code (not necessarily elegant, but pretty robust) that does 2D binning and generates plots somewhat similar to the ones above:

    require(mvtnorm)
    xy <- rmvnorm(1000,c(5,10),sigma=rbind(c(3,-2),c(-2,3)))

    nbins <- 20
    x.bin <- seq(floor(min(xy[,1])), ceiling(max(xy[,1])), length=nbins)
    y.bin <- seq(floor(min(xy[,2])), ceiling(max(xy[,2])), length=nbins)

    freq <-  as.data.frame(table(findInterval(xy[,1], x.bin),findInterval(xy[,2], y.bin)))
    freq[,1] <- as.numeric(freq[,1])
    freq[,2] <- as.numeric(freq[,2])

    freq2D <- diag(nbins)*0
    freq2D[cbind(freq[,1], freq[,2])] <- freq[,3]

    par(mfrow=c(1,2))
    image(x.bin, y.bin, freq2D, col=topo.colors(max(freq2D)))
    contour(x.bin, y.bin, freq2D, add=TRUE, col=rgb(1,1,1,.7))

    palette(rainbow(max(freq2D)))
    cols <- (freq2D[-1,-1] + freq2D[-1,-(nbins-1)] + freq2D[-(nbins-1),-(nbins-1)] + freq2D[-(nbins-1),-1])/4
    persp(freq2D, col=cols)

For a really fun time, try making an interactive, zoomable, 3D surface:

require(rgl)
surface3d(x.bin,y.bin,freq2D/10, col="red")



回答2:

Bivariate density estimates can be done with MASS::kde2d, or KernSmooth::bkde2D (both supplied with the base R distribution). The latter uses an algorithm based on the fast Fourier transform over a grid of points, and is very fast. The result can be plotted with contour or persp or similar functions in other graphing packages.

Using your data:

require(KernSmooth)
z <- bkde2D(df, .5)
persp(z$fhat)



回答3:

If you want it with a 2d contour, you can also use the package ggplot2. Some example code is shown in this question:

gradient breaks in a ggplot stat_bin2d plot Adjusted slightly:

x <- rnorm(10000)+5
y <- rnorm(10000)+5
df <- data.frame(x,y)
require(ggplot2)
p <- ggplot(df, aes(x, y)) 
p <- p + stat_bin2d(bins = 20)
p

Here's the output of the code above:



回答4:

For completeness, you can also use the hist2d{gplots} function. It seems to be the most straightforward for a 2D plot:

library(gplots)

# data is in variable df

# define bin sizes
bin_size <- 0.5
xbins <- (max(df$x) - min(df$x))/bin_size
ybins <- (max(df$y) - min(df$y))/bin_size

# create plot
hist2d(df, same.scale=TRUE, nbins=c(xbins, ybins))

# if you want to retrieve the data for other purposes
df.hist2d <- hist2d(df, same.scale=TRUE, nbins=c(xbins, ybins), show=FALSE)
df.hist2d$counts


回答5:

i came to this page from http://www.r-bloggers.com/5-ways-to-do-2d-histograms-in-r/ which lists one of the answers above. It provides code samples for a total of 5 methods:

hist2d from the library gplots
hexbin,hexbinplot from the library hexbin
stat_bin2d from the library ggplot2
kde2d from the library MASS
the "hard way" solution listed above.


回答6:

freq <-  as.data.frame(table(findInterval(xy[,1], x.bin),findInterval(xy[,2], y.bin)))
    freq[,1] <- as.numeric(freq[,1])
    freq[,2] <- as.numeric(freq[,2])

This is probably wrong since it destroys the original indices.