Make step function adopt 90 degree transitions

2019-08-12 17:36发布

问题:

I suspect this may be an extremely stupid question, but here goes! (also, apologies if this is better suited to CrossValidated, I'm not sure at this point whether this is a programming issue, or needs approaching more statistically...

I have created a step-function with cumSeg (a.k.a staircase function, a.k.a peicewise constant function) and fit it to some discontinuous (x axis) data, as in the code/figure below.

That's all fine, I'm pretty happy with it, but I'm wondering if I can make the step function (red line) have a vertical transition (i.e so that both 'shoulders' of the function are 90 degrees). In order to do that the value on the x-axis would have to be between the current 2 jump points. Is this possible?

If so, it brings me to another question, how might one represent the st. deviation on that line on the chart if it has these 90 degree transitions and a vertical descent?

# Plotting step-functions on to GC-operon data.

require(ggplot2)
library("ggplot2")
require(reshape2)
library("reshape2")
require(scales)
library(RColorBrewer)
library(cumSeg)

df <- structure(list(PVC1 = 0.4019026, PVC2 = 0.4479259, PVC3 = 0.4494118, PVC4 = 0.4729437,
                 PVC5 = 0.4800556, PVC6 = 0.449229, PVC7 = 0.4905295, PVC8 = 0.4457566,
                 PVC9 = 0.4271259, PVC10 = 0.4850341, PVC11 = 0.4369965, PVC12 = 0.4064052,
                 PVC13 = 0.3743776, PVC14 = 0.3603853, PVC15 = 0.3965469, PVC16 = 0.365461),
                .Names = c("PVC1","PVC2","PVC3","PVC4","PVC5","PVC6","PVC7","PVC8",
                           "PVC9","PVC10","PVC11","PVC12","PVC13","PVC14","PVC15","PVC16"),
                class = "data.frame",
                row.names = c(NA, -1L)
            )

melted_df <- melt(df, variable.name = "Locus", value.name = "GC")
st_dev <- c(0.023031363,    0.024919217,    0.017371129,
        0.019008759,    0.026650605,    0.026904926,
        0.024227542,    0.017767553,    0.026152478,
        0.039770898,    0.023929714,    0.028845442,
        0.015572219,    0.024967336,    0.014955416,    0.024569096)

operon_gc <- 0.408891366
opgc_stdev <- 0.015712091
genome_gc <- 0.425031611
gengc_stdev <- 0.007587437

stepfunc <- jumpoints(y=melted_df$GC*100, k=1, output="1")

gc_chart <- ggplot(melted_df, aes(Locus, GC*100, fill=Locus,)) +
                geom_bar(width=0.6, stat = "identity")
gc_chart <- gc_chart + ylab("GC Content (%)")
gc_chart <- gc_chart + theme(axis.text.x = element_text(angle=45, hjust=1))
gc_chart <- gc_chart + geom_abline(intercept=operon_gc*100,
                               slope=0,
                               colour="gray",
                               linetype=3,
                               show.legend =TRUE)
gc_chart <- gc_chart + geom_text(aes(15.7, 41.7, label="Operon GC"),
                             size=5,
                             color="gray")
gc_chart <- gc_chart + geom_abline(intercept=genome_gc*100,
                               slope=0,
                               colour="black",
                               linetype=3,
                               show.legend = TRUE)
 gc_chart <- gc_chart + geom_text(aes(15.7, 43.3, label="Genome GC"),
                             size=5,
                             color="black")
 gc_chart <- gc_chart + coord_cartesian(ylim=c(30,55))
 gc_chart <- gc_chart + geom_errorbar(width=.2, size=0.4, color="azure4", aes(Locus, 
                ymin = (GC - cbind(melted_df, st_dev)$st_dev)*100, 
                ymax = (GC + cbind(melted_df, st_dev)$st_dev)*100))
gc_chart <- gc_chart + geom_line(linetype=2,aes(x=as.numeric(Locus), y=stepfunc$fitted.values, colour="red", group=1))

gc_chart

EDIT: @Gregor, geom_step has achieved the desired effect thank you (literally substituting line for step in my code generates this:

However, but that graph, the breakpoint would be at PVC12. Yet, when extracting the breakpoint value from the function...

> stepfunc$psi
 V 
11 

In which case the graph becomes misleading and perhaps I'm better off with the previous version which just demonstrates that there is a break 'between 11 and 12'.

回答1:

ggplot is great at plotting the data you give it. Your x values are 1:12, and your y values are 45's then 37's. If you don't like having your y values (and changes in y values) defined at integer values of x, then change your values!

step_df = data.frame(x = c(1, 11.5, 16), y = c(45.24568, 37.5, 37.5))

gc_chart + geom_step(data = stepdf, linetype=2,
                     aes(x = x, y = y, colour="red", group=1),
                     inherit.aes = F)

I'll leave programmatically defining an appropriate step_df to you.



标签: r ggplot2