Note: I changed the example from when I first posted. My first example was too simplified to capture the real problem.
I have two data frames which are sorted differently in one column. I want to match one column and then merge in the value from the second column. The second column needs to stay in the same order.
So I have this:
state<-c("IA","IA","IA","IL","IL","IL")
value1<-c(1,2,3,4,5,6)
s1<-data.frame(state,value1)
state<-c("IL","IL","IL","IA","IA","IA")
value2<-c(3,4,5,6,7,8)
s2<-data.frame(state,value2)
s1
s2
which returns this:
> s1
state value1
1 IA 1
2 IA 2
3 IA 3
4 IL 4
5 IL 5
6 IL 6
> s2
state value2
1 IL 3
2 IL 4
3 IL 5
4 IA 6
5 IA 7
6 IA 8
and I want this:
state value1 value2
1 IA 1 6
2 IA 2 7
3 IA 3 8
4 IL 4 3
5 IL 5 4
6 IL 6 5
I'm about to drive myself silly trying to solve this. Seems like it should be a simple subscript problem.
NOTE: Check the 5th comment on the answer above. Solution should be
Tested and working.
There are several ways to do this (it is R, after all) but I think the most clear is creating an index. We need a function that creates a sequential index (starting at one and ending with the number of observations).
But we need to calculate this index within each grouping variable (state). For this we can use R's
ave
function. It takes a numeric as the first argument, then the grouping factors, and finally the function to be applied in each group.(Note the use of
with
, which tells R to search for the variables within the environment/dataframe. This is better practice than using s1$value1, s2$value2, etc.)Now we can simply merge (join) the two data frames (by the variables present in the both data frames: state and index).
which gives
For this to work, there should be the same number of observations by state in each of the data frames.
[Edit: commented the code for clarity.] [Edit: Used seq_len instead of creating a new function as suggested by hadley.]