scatterplotMatrix with group histograms

2020-06-30 00:50发布

It's pretty easy to build a nice huge scatterplot matrix with histograms down the diagonal for multivariate data as follows:

scatterplotMatrix(somedata[1:points.count,],groups=somedata[1:points.count,class],
                by.groups=TRUE,diagonal="histogram")

According to the documentation though, it doesn't seem possible to divide up the histogram by the group labels as is done in this question. How would you do that using scatterplotMatrix or a similar function?

标签: r plot histogram
2条回答
老娘就宠你
2楼-- · 2020-06-30 01:46

For later reference, the GGally way to do it is as follows:

require(ggpairs)
tmp <- data.table(a = runif(30),b = runif(30), c = runif(30)+1, 
                  d = as.factor(sample(0:1,size=30, replace=TRUE)))

ggpairs(data=tmp, diag=list(continuous="density"), columns=1:3, colour="d",
        axisLabels="show")

pairwise scatterplot matrix with group densities on diagonal

This intrepid asker figured out that you have to enable axisLabels which is somewhat silly, given the aesthetic emphasis of ggplot and friends.

Now I want to know how to parallelize this, because it's a monster with high numbers of variables.

查看更多
Summer. ? 凉城
3楼-- · 2020-06-30 01:47

Is this what you had in mind?

Using the iris dataset:

library(ggplot2)
library(data.table)
library(reshape2)  # for melt(...)
library(plyr)      # for .(...)

xx <- with(iris, data.table(id=1:nrow(iris), group=Species, 
           Sepal.Length, Sepal.Width,Petal.Length, Petal.Width))
# reshape for facetting with ggplot
yy <- melt(xx,id=1:2, variable.name="H", value.name="xval")
yy <- data.table(yy,key="id,group")
ww <- yy[,list(V=H,yval=xval),key="id,group"]
zz <- yy[ww,allow.cartesian=T]
setkey(zz,H,V,group)
zz <- zz[,list(id, group, xval, yval, min.x=min(xval), min.y=min(yval),
               range.x=diff(range(xval)),range.y=diff(range(yval))),by="H,V"]
# points colored by group (=species)
# density plots for each variable by group
d  <-  zz[H==V, list(x=density(xval)$x,
          y=mean(min.y)+mean(range.y)*density(xval)$y/max(density(xval)$y)),
          by="H,V,group"]
ggp = ggplot(zz)
ggp = ggp + geom_point(subset  =.(H!=V), 
                       aes(x=xval, y=yval, color=factor(group)), 
                       size=3, alpha=0.5)
ggp = ggp + geom_line(subset = .(H==V), data=d, aes(x=x, y=y, color=factor(group)))
ggp = ggp + facet_grid(V~H, scales="free")
ggp = ggp + scale_color_discrete(name="Species")
ggp = ggp + labs(x="", y="")
ggp

I keep hearing that the same thing is possible using ggpairs(...) in package GGally. I would love to see an actual example of it. The documentation is inscrutable. Also, ggpairs(...) is extremely slow (in my hands), especially with large datasets.

查看更多
登录 后发表回答