I am trying to copy sets of rows into columns using dplyr. Following is my data frame.
df <- data.frame(
hid=c(1,1,1,1,2,2,2,2,2,3,3,3,3),
mid=c(1,2,3,4,1,2,3,4,5,1,2,3,4),
tmid=c("010","01010","010","01020",
"010","0120","010","010","020",
"010","01010","010","01020"),
thid=c("010","02020","010","02020",
"000","0120","010","010","010",
"010","02020","010","02020"),
)
It is printed in the following format:
> df
hid mid tmid thid
1 1 1 010 010
2 1 2 01010 02020
3 1 3 010 010
4 1 4 01020 02020
5 2 1 010 000
6 2 2 0120 0120
7 2 3 010 010
8 2 4 010 010
9 2 5 020 010
10 3 1 010 010
11 3 2 01010 02020
12 3 3 010 010
13 3 4 01020 02020
My desired output is show below:
hid mid tmid thid tmid1 tmid2 tmid3 tmid4 tmid5 thid1 thid2 thid3 thid4 thid5
* <dbl> <dbl> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr>
1 1 1 010 010 010 01010 010 01020 0 010 02020 010 02020 0
2 1 2 01010 02020 010 01010 010 01020 0 010 02020 010 02020 0
3 1 3 010 010 010 01010 010 01020 0 010 02020 010 02020 0
4 1 4 01020 02020 010 01010 010 01020 0 010 02020 010 02020 0
5 2 1 010 000 010 0120 010 010 020 000 0120 010 010 010
6 2 2 0120 0120 010 0120 010 010 020 000 0120 010 010 010
7 2 3 010 010 010 0120 010 010 020 000 0120 010 010 010
8 2 4 010 010 010 0120 010 010 020 000 0120 010 010 010
9 2 5 020 010 010 0120 010 010 020 000 0120 010 010 010
10 3 1 010 010 010 01010 010 01020 0 010 02020 010 02020 0
11 3 2 01010 02020 010 01010 010 01020 0 010 02020 010 02020 0
12 3 3 010 010 010 01010 010 01020 0 010 02020 010 02020 0
13 3 4 01020 02020 010 01010 010 01020 0 010 02020 010 02020 0
- Converting
thid
andtmid
into column - Suffix in
thid_x
andtmid_x
is defined bymid
; however, maximum number ofmid
is not scalable (it spreads from 1 to perhaps 8 in actual large data set) - Same values of
thid_x
andtmid_x
are set by groups ofhid
- If value does not exist, it should be padded by
0
Idea of this manipulation is shown in the following figure.
I am currently trying to use spread
but it returns specific pairs of mid
and thid
or tmid
. I need to fill remaining <NA>s
by a value which remains in the output grouped by hid
.
> df %>% mutate(id1=str_c("tmid",mid)) %>% group_by(hid) %>% spread(key=id1,value=tmid)
# A tibble: 13 x 8
# Groups: hid [3]
hid mid thid tmid1 tmid2 tmid3 tmid4 tmid5
* <dbl> <dbl> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr>
1 1 1 010 010 <NA> <NA> <NA> <NA>
2 1 2 02020 <NA> 01010 <NA> <NA> <NA>
3 1 3 010 <NA> <NA> 010 <NA> <NA>
4 1 4 02020 <NA> <NA> <NA> 01020 <NA>
5 2 1 000 010 <NA> <NA> <NA> <NA>
6 2 2 0120 <NA> 0120 <NA> <NA> <NA>
7 2 3 010 <NA> <NA> 010 <NA> <NA>
8 2 4 010 <NA> <NA> <NA> 010 <NA>
9 2 5 010 <NA> <NA> <NA> <NA> 020
10 3 1 010 010 <NA> <NA> <NA> <NA>
11 3 2 02020 <NA> 01010 <NA> <NA> <NA>
12 3 3 010 <NA> <NA> 010 <NA> <NA>
13 3 4 02020 <NA> <NA> <NA> 01020 <NA>
Any suggestions?