How to use dcast function to transform my dataset

2019-09-10 11:27发布

问题:

I have a very big dataset. It consist more than 10 million records. It is very difficult to use this much of dataset to apply any algorithm. So, that I trying to restructure this dataset. In my dataset, so many records are there per one customer. Now I am trying to convert one record per one customer.

Here I am representing my sample mock up data.

d1<-structure(
        list(userid  = c(64455670203, 64455670203, 64455670203, 64455670203, 64455670203, 64455670204, 64455670204, 64455670204, 64455670204, 64455670204),
             day     = c(1L, 1L, 2L, 3L, 3L, 2L, 2L, 3L, 4L, 4L),
             channel = structure(
                          c(1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L),
                          .Label = c("dsp", "osr"),
                          class = "factor"
                       )
        ),
        .Names    =  c("userid", "day", "channel"),
        class     = "data.frame",
        row.names = c(NA, -10L)
)

Now I am planning to convert the above represented data as follows..

d2<-structure(
    list(csm_id = c(64455670203, 64455670204),
         dsp1   = c(2L, 0L),
         dsp2   = c(1L, 1L),
         dsp3   = c(1L, 0L),
         dsp4   = 0:1,
         ors1   = c(0L, 0L),
         ors2   = 0:1,
         ors3   = c(1L, 1L),
         ors4   = 0:1
    ),
    .Names    = c("csm_id", "dsp1", "dsp2", "dsp3", "dsp4", "ors1", "ors2", "ors3", "ors4"),
    class     = "data.frame",
    row.names = c(NA, -2L)
)

Here what I am trying to do is, first I find distinct channels and distinct days in my dataset. Now I am concatenating those two objects(distinct channels and days) and then use these as column names of my new dataset.

I wrote a simple code in R. But it is really time consuming. Can anyone help me to do this task.

How to do same operation in python also?

Thanks in advance.

回答1:

Try

 dcast(d1, userid~channel+day, value.var='day', drop=FALSE)
 #        userid dsp_1 dsp_2 dsp_3 dsp_4 osr_1 osr_2 osr_3 osr_4
 #1 64455670203     2     1     1     0     0     0     1     0
 #2 64455670204     0     1     0     1     0     1     1     1