如何显现的概率分布函数之间的区别? [关闭](How to visualise the diff

2019-09-26 13:55发布

我尝试以可视化的分布函数两个直方图之间的差,如在下面的两个曲线的区别:

当差异大,你可以只上绘制彼此顶部两个曲线,并填补了差异如上表示,虽然当差变得很小,这很麻烦。 绘制此的另一种方式,如下绘制差异本身:

然而,这似乎很难读给大家看这样的图是第一次,所以我想知道:有没有你可以想像两个配送功能之间的区别任何其他方式?

Answer 1:

我想,也许它可能是简单地结合你的两个命题的一个选项,而扩大的差异使它们可见。

下面是一个尝试与GGPLOT2做到这一点。 其实这是相当多的参与到这样做比我最初以为,我绝对不是一个满意的结果百分之百; 但也许它有助于不过。 评论和改进非常欢迎。

library(ggplot2)
library(dplyr)

## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
  hues = seq(15, 375, length=n+1)
  hcl(h=hues, l=65, c=100)[1:n]
}

## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
  mutate(sample = as.factor(sample))

## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
  geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)

## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2), 
                  side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
   geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
   geom_line(aes(x = x, y = 5 * abs(ydiff), colour = side)) +
   geom_area(aes(x = x, y = 5 * abs(ydiff), fill = side, alpha = 0.4))
g3 <- g2 + 
   geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
   xlim(0, 10) +
   guides(alpha = FALSE, colour = FALSE) +
   ylab("Curves: density\n Shaded area: 5 * difference of densities") +
   scale_fill_manual(name = "samples", labels = 1:2, values = gg_color_hue(2)) +
   scale_colour_manual(limits = list(1, 2, FALSE, TRUE), values = rep(gg_color_hue(2), 2))

print(g3)

来源: SO回答1 , SO 2回答


正如意见建议的@Gregor,这里有一个版本,那么下面的海誓山盟两个独立的地块,但共享相同x轴缩放。 至少说明要明显进行调整。

library(ggplot2)
library(dplyr)
library(grid)

## function that replicates default ggplot2 colors
## taken from [1]
gg_color_hue <- function(n) {
  hues = seq(15, 375, length=n+1)
  hcl(h=hues, l=65, c=100)[1:n]
}

## Set up sample data
set.seed(1)
n <- 2000
x1 <- rlnorm(n, 0, 1)
x2 <- rlnorm(n, 0, 1.1)
df <- bind_rows(data.frame(sample=1, x=x1), data.frame(sample=2, x=x2)) %>%
  mutate(sample = as.factor(sample))

## Calculate density estimates
g1 <- ggplot(df, aes(x=x, group=sample, colour=sample)) +
  geom_density(data = df) + xlim(0, 10)
gg1 <- ggplot_build(g1)

## Use these estimates (available at the same x coordinates!) for
## calculating the differences.
## Inspired by [2]
x <- gg1$data[[1]]$x[gg1$data[[1]]$group == 1]
y1 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 1]
y2 <- gg1$data[[1]]$y[gg1$data[[1]]$group == 2]
df2 <- data.frame(x = x, ymin = pmin(y1, y2), ymax = pmax(y1, y2), 
                  side=(y1<y2), ydiff = y2-y1)
g2 <- ggplot(df2) +
   geom_ribbon(aes(x = x, ymin = ymin, ymax = ymax, fill = side, alpha = 0.5)) +
   geom_density(data = df, size = 1, aes(x = x, group = sample, colour = sample)) +
  xlim(0, 10) +
  guides(alpha = FALSE, fill = FALSE)
g3 <- ggplot(df2) +
   geom_line(aes(x = x, y = abs(ydiff), colour = side)) +
   geom_area(aes(x = x, y = abs(ydiff), fill = side, alpha = 0.4)) +
   guides(alpha = FALSE, fill = FALSE)
## See [3]
grid.draw(rbind(ggplotGrob(g2), ggplotGrob(g3), size="last"))

...或与abs(ydiff)替换ydiff在第二情节的结构:

来源: SO 3回答



文章来源: How to visualise the difference between probability distribution functions? [closed]