I have a table which has unequal number of element in string format
File1 A B C
File2 A B D
File3 E F
I want to convert into a format as follows
A B C D E F
File1 1 1 1 0 0 0
FIle2 1 1 0 1 0 0
File3 0 0 0 0 1 1
I tried to do it using reshape2 but was not successful.
Sample data:
mydata <- structure(list(V1 = c("File1", "File2", "File3"),
V2 = c("A", "A", "E"), V3 = c("B", "B", "F"),
V4 = c("C", "D", "")),
.Names = c("V1", "V2", "V3", "V4"),
class = "data.frame", row.names = c(NA, -3L))
A reasonably efficient approach is to use the (presently) non-exported
charMat
function from my "splitstackshape" package. Since it's not exported, you will have to use:::
to access it.Under the hood,
charMat
makes use of matrix indexing to process everything pretty efficiently. Step-by-step, this is whatcharMat
does.That looks like a mouthful, but it is actually quite a fast operation, made faster by using the
charMat
function :-)Update: Benchmarks
The following benchmarks test Henrik's answer with my
charMat
answer, and also adapts Henrik's answer to use "data.table" instead, for better efficiency.Two tests were run. The first is on a similar dataset with 90K rows, and the second on one with 900K rows.
Here's the sample data:
Here are the functions to benchmark:
And the results of the benchmarking.
One possibility: