Heatmap or plot for a correlation matrix [duplicat

2019-04-06 17:51发布

This question already has an answer here:

I tried to make a plot out of the correlation matrix and having three colours to represent the correlation coefficients using the library lattice.

library(lattice)

levelplot(cor)

I obtain the following plot:

Plot of correlation matrix

The plot is only for a subset of the data I had. When I use the whole dataset( 400X400) then it becomes unclear and the colouring is not shown properly and is shown as dots. Is it possible to obtain the same in tile form for a large matrix?

I tried using the pheatmap function but I do not want my values to be clustered and just want a representaion of high and low values clearly in a tile form.

标签: r lattice
2条回答
Explosion°爆炸
2楼-- · 2019-04-06 18:15

If you want to do a correlation plot, use the corrplot library as it has a lot of flexibility to create heatmap-like figures for correlations

library(corrplot)
#create data with some correlation structure
jnk=runif(1000)
jnk=(jnk*100)+c(1:500, 500:1)
jnk=matrix(jnk,nrow=100,ncol=10)
jnk=as.data.frame(jnk)
names(jnk)=c("var1", "var2","var3","var4","var5","var6","var7","var8","var9","var10")

#create correlation matrix
cor_jnk=cor(jnk, use="complete.obs")
#plot cor matrix
corrplot(cor_jnk, order="AOE", method="circle", tl.pos="lt", type="upper",        
tl.col="black", tl.cex=0.6, tl.srt=45, 
         addCoef.col="black", addCoefasPercent = TRUE,
         p.mat = 1-abs(cor_jnk), sig.level=0.50, insig = "blank")  

enter image description here The code above only adds color to the correlations that have > abs(0.5) correlation, but you can easily change that. Lastly, there are many ways that you can configure the look of the plot as well (change the color gradient, display of correlations, display of full vs only half matrix, etc.). The order argument is particularly useful as it allows you to order your variables in the correlation matrix based on PCA, so they are ordered based on similarities in correlation.

For squares for instance (similar to your original plot)- just change the method to squares: enter image description here

EDIT: @Carson. You can still use this method for reasonable large correlation matrices: for instance a 100 variable matrix below. Beyond that, I fail to see what is the use of making a graphical representation of a correlation matrix with so many variables without some subsetting, as that will be very hard to interpret. enter image description here

查看更多
萌系小妹纸
3楼-- · 2019-04-06 18:18

@Lucas provides good advice here as corrplot is quite useful for visualizing correlation matrices. However, it doesn't address the original issue of plotting a large correlation matrix. In fact, corrplot will also fail when trying to visualize this large of a correlation matrix. For a simple solution, you might want to consider reducing the number of variables. That is, I would suggest looking at the correlation between a subset of variables that you know are important for your problem. Trying to understand the correlation structure of that many variables will be a difficult task (even if you can visualize it)!

查看更多
登录 后发表回答