I have an issue where the trendline for the second grid looks negative whereas the spearman correlation is weak positive (0.1). I would appreciate if someone can clarify whether the difference of direction is due to incorrect formula or weak correlation.
I also realized that similar issue occurs at rho=-0.3 where the trendline is positive.
Thanks.
sc_df
OTU_166911 Body weight EXPT Group
68 41.132985 36.5 ABX2 S T2 HFHS+amp
69 15.589949 34.8 ABX2 S T2 HFHS+amp
70 15.504802 30.5 ABX2 S T2 HFHS+amp
71 5.339616 35.8 ABX2 S T2 HFHS+amp
72 40.697005 33.9 ABX2 S T2 HFHS+amp
188 2.893428 33.4 ABX3 S T2 HFHS+amp
189 20.891697 37.6 ABX3 S T2 HFHS+amp
190 3.195469 40.5 ABX3 S T2 HFHS+amp
191 2.689137 34.2 ABX3 S T2 HFHS+amp
192 13.997269 30.0 ABX3 S T2 HFHS+amp
df4
Group EXPT value
1 S T2 HFHS+amp ABX2 0.30
2 S T2 HFHS+amp ABX3 0.10
ggplot(sc_df, aes(x = sc_df[,partner1], y = sc_df[,partner2])) +
geom_point(shape=1, color="blue", size = 3) +
geom_smooth(method="lm", se=FALSE) +
facet_wrap(~EXPT, scales = "free") +
geom_text(data=df4, aes(x=Inf, y=Inf,hjust=2,vjust=2, label=paste("rho==",value,sep="")), parse=T, family = "Arial", size=4) +
xlab(partner1) +
ylab(partner2) +
theme(plot.title = element_text(hjust = 0.5),text=element_text(family="Arial", size=10)) +
ggtitle(g)
The discrepancy is due to using Spearman's rho, while the trendline is based on a linear model, i.e. Pearson's r.
Consider the relevant text from
?cor
:I renamed your variables for simplicity:
First, we'll prove that the definition of Spearman's rho is correct and that it differs from Pearson's r.
Notice that the two values for rho are the same, but that they differ in sign and magnitude from r.
Reasons for this include yes, poor correlation, but also that ranking removes any information about how far apart each observation is. Even if two observations are infinitesimally close together, they'll still be 1 ranking apart. Likewise, two observations could be hugely different, but if there are none between them, they'll only be 1 ranking apart.
Have a look:
Notice the two left-most points in the right panel. Even though they're very close together, their rankings are only 1 unit apart in each direction. As far as rho is concerned, they share as much information in the y-direction as the top two points, which are much farther apart.
To illustrate how much this can change the values, let's rescale the ranks to the scale of the original values. The original computation of
rank
gives you 1 to 5, let's instead make those evenly spaced across, for example, 5.3 to 41.1 for the first group in the X direction.Visually, this looks like:
You can see that some points barely move, while some move a lot. Those discrepancies are enough to change the magnitude, and sometimes sign, of the correlation coefficient.