If I have a series of observations with two variables X and Y, how can I get the average value of Y based on ranges of variable X?
So for example, with some data like:
df = data.frame(x=runif(50,1,100),y=runif(50,300,700))
How could I get the answer to "When X is 1-10 the average of y 332.4, when X is 11-20 the average of y is 632.3, etc...."
One way is to use
cut()
to create a factor from thex
variable, specifying breaks every ten units. Given that factor, you can then useby()
oraggregate()
or ... to summarise the data frame, or rather just columny
:Or using
ddply()
:Use
cut
to form groups andtapply
to summarise over them.If you are a
plyr
fan you may preferFor completeness, the
aggregate
version isI think your question is causing your answers to be too narrow. You ought to be thinking of regression methods to summarize the joint relationships of continuous variables. Plotting with scatterplots and fitting regression splines is going to do less violence to the underlying relationships than the piecewise analysis that you specified.
You can use
tapply
withpretty
to make the breakpoints forcut
:aggregate
can also be used:Here is the
data.table
solutionCut your x using
cut
and then useddply
in packageplyr
: