I have data that looks like this:
ID X1 X2 X3
1 1.4 2 two
2 7.6 30 thirty
2 7.6 50 fifty
2 7.6 40 forty
3 5.6 40 forty
3 5.6 50 fifty
4 3.5 NA NA
5 NA 2 two
ID indicates individuals, X1 is a measurement taken once per individual, and X2 and X3 are character and string representations of the same treatment. E.g., for individual 2, X1= 7.6, and was subjected to treatment X2 = 30, 50, and 40 (which has associated other info, X3 = thirty, fifty, and forty)
I want it to end up looking like this, with only one row per individual, and values from multiple observations on the same individual added as new columns:
ID X1 X2a X3a X2b X3b X2c X3c
1 1.4 2 two NA NA NA NA
2 7.6 30 thirty 50 fifty 40 forty
3 5.6 40 forty 50 fifty NA NA
4 3.5 NA NA NA NA NA NA
5 NA 2 two NA NA NA NA
I want to keep rows with X1=NA (like subject 5), because those rows have measurements of interest in other columns that are mostly excluded for the simplicity of my example. My actual dataset is pretty big (10,000s of rows, and maybe 50 columns), so a method that can efficiently handle a fairly big dataset would be a plus.
I've been playing with melt() and cast() from the reshape package, which I think can do what I want. But, I am getting an error requiring me to aggregate (Error: Aggregation requires fun.aggregate: length used as default), which I don't want to do. Suggestions for getting melt and cast to cooperate? or a better way to do this?
I've found a number of similar questions (for example, this one), but none seem to address my problem as they don't start with a similar data structure or they don't care which instances of the data end up matched with which...