How to use dcast function to transform my dataset

2019-09-10 11:27发布


I have a very big dataset. It consist more than 10 million records. It is very difficult to use this much of dataset to apply any algorithm. So, that I trying to restructure this dataset. In my dataset, so many records are there per one customer. Now I am trying to convert one record per one customer.

Here I am representing my sample mock up data.

        list(userid  = c(64455670203, 64455670203, 64455670203, 64455670203, 64455670203, 64455670204, 64455670204, 64455670204, 64455670204, 64455670204),
             day     = c(1L, 1L, 2L, 3L, 3L, 2L, 2L, 3L, 4L, 4L),
             channel = structure(
                          c(1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L),
                          .Label = c("dsp", "osr"),
                          class = "factor"
        .Names    =  c("userid", "day", "channel"),
        class     = "data.frame",
        row.names = c(NA, -10L)

Now I am planning to convert the above represented data as follows..

    list(csm_id = c(64455670203, 64455670204),
         dsp1   = c(2L, 0L),
         dsp2   = c(1L, 1L),
         dsp3   = c(1L, 0L),
         dsp4   = 0:1,
         ors1   = c(0L, 0L),
         ors2   = 0:1,
         ors3   = c(1L, 1L),
         ors4   = 0:1
    .Names    = c("csm_id", "dsp1", "dsp2", "dsp3", "dsp4", "ors1", "ors2", "ors3", "ors4"),
    class     = "data.frame",
    row.names = c(NA, -2L)

Here what I am trying to do is, first I find distinct channels and distinct days in my dataset. Now I am concatenating those two objects(distinct channels and days) and then use these as column names of my new dataset.

I wrote a simple code in R. But it is really time consuming. Can anyone help me to do this task.

How to do same operation in python also?

Thanks in advance.



 dcast(d1, userid~channel+day, value.var='day', drop=FALSE)
 #        userid dsp_1 dsp_2 dsp_3 dsp_4 osr_1 osr_2 osr_3 osr_4
 #1 64455670203     2     1     1     0     0     0     1     0
 #2 64455670204     0     1     0     1     0     1     1     1