I'm a power excel pivot table user who is forcing himself to learn R. I know exactly how to do this analysis in excel, but can't figure out the right way to code this in R.
I'm trying to group user data by 2 different variables, while grouping the variables into ranges (or bins), then summarizing other variables.
Here is what the data looks like:
userid visits posts revenue
1 25 0 25
2 2 2 0
3 86 7 8
4 128 24 94
5 30 5 18
… … … …
280000 80 10 100
280001 42 4 25
280002 31 8 17
Here is what I am trying to get the output to look like:
VisitRange PostRange # of Users Total Revenue Average Revenue
0 0 X Y Z
1-10 0 X Y Z
11-20 0 X Y Z
21-30 0 X Y Z
31-40 0 X Y Z
41-50 0 X Y Z
> 50 0 X Y Z
0 1-10 X Y Z
1-10 1-10 X Y Z
11-20 1-10 X Y Z
21-30 1-10 X Y Z
31-40 1-10 X Y Z
41-50 1-10 X Y Z
> 50 1-10 X Y Z
want to group by visits and posts by 10 up to a certain level, then group anything higher than 50 as '> 51'
I've looked a tapply and ddply as ways to accomplish this, but I don't think they will work the way I am expecting, but I could be wrong.
Lastly, I know I could do this in SQL using and if/then statement to identify the range of visits and the range of posts (for example - If visits between 1 and 10, then '1-10'), then just group by visit range and post range, but my goal here is to start forcing myself to use R. Maybe R isn't the right tool here, but I think it is…
All help would be appreciated. Thanks in advance.