Let's say, I have scores for 5 countries over a period of 10 years such as:
mydata<-1:3
mydata<-expand.grid(
country=c('A', 'B', 'C', 'D', 'E'),
year=c('1980','1981','1982','1983','1984','1985','1986','1987','1988','1989'))
mydata$score=sapply(runif(50,0,2), function(x) {round(x,4)})
library(reshape)
mydata<-reshape(mydata, v.names="score", idvar="year", timevar="country", direction="wide")
> head(mydata)
year score.A score.B score.C score.D score.E
1 1980 1.0538 1.6921 1.3165 1.7434 1.9687
6 1981 1.4773 1.6479 0.3135 0.6172 0.7704
11 1982 0.8748 1.3704 0.2788 1.6306 1.7237
16 1983 1.1224 1.1340 1.7684 1.3352 0.4317
21 1984 1.5496 1.8706 1.4641 0.5313 0.8590
26 1985 1.7715 1.8953 0.6230 0.3580 1.6313
Now, I would like to create a new variable "period" that is 1 if the score of the subsequent year is +/- 0.5 different from the score of the previous year and that is 0 if this is not true. I would like to do so for all 5 countries. And it would be great if it were possible to identify the country-years for which period = 1 and display this information in a table.
> head(mydata)
year score.A score.B score.C score.D score.E period.A period.B ...
1 1980 1.0538 1.6921 1.3165 1.7434 1.9687 NA NA
6 1981 1.4773 1.6479 0.3135 0.6172 0.7704 0 ....
11 1982 0.8748 1.3704 0.2788 1.6306 1.7237 1
16 1983 1.1224 1.1340 1.7684 1.3352 0.4317 0
21 1984 1.5496 1.8706 1.4641 0.5313 0.8590 0
26 1985 1.7715 1.8953 0.6230 0.3580 1.6313 0
I very much hope that this is not too much to ask. I tried it with dist
in the library(proxy)
but I do not know how to restrict the function to pairs of observation rather than the full row. Thanks a million!!
This one uses
diff
andlapply
:Edit: It becomes more apparent (with your other question posted this morning) that maybe you are not working with the right data structure. Instead, I would recommend you do your work on the raw (before it is reshaped) data:
Only then you would reshape
mydata
to get it in its final form.You can do it with just one line with
dplyr
:Then you can reshape it with
dcast
Results:
First, create the data, using
set.seed()
to make it reproducible:Next, use dplyr to break into countries and compare each value to the previous:
After this you could coerce
big_diff()
to a number, and use reshape to movecountry
to the columns, but I probably wouldn't because it will be harder to work with in future steps. See tidy data for more details.