How to use dcast function to transform my dataset

I have a very big dataset. It consist more than 10 million records. It is very difficult to use this much of dataset to apply any algorithm. So, that I trying to restructure this dataset. In my dataset, so many records are there per one customer. Now I am trying to convert one record per one customer.

Here I am representing my sample mock up data.

d1<-structure(
        list(userid  = c(64455670203, 64455670203, 64455670203, 64455670203, 64455670203, 64455670204, 64455670204, 64455670204, 64455670204, 64455670204),
             day     = c(1L, 1L, 2L, 3L, 3L, 2L, 2L, 3L, 4L, 4L),
             channel = structure(
                          c(1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L),
                          .Label = c("dsp", "osr"),
                          class = "factor"
                       )
        ),
        .Names    =  c("userid", "day", "channel"),
        class     = "data.frame",
        row.names = c(NA, -10L)
)

Now I am planning to convert the above represented data as follows..

d2<-structure(
    list(csm_id = c(64455670203, 64455670204),
         dsp1   = c(2L, 0L),
         dsp2   = c(1L, 1L),
         dsp3   = c(1L, 0L),
         dsp4   = 0:1,
         ors1   = c(0L, 0L),
         ors2   = 0:1,
         ors3   = c(1L, 1L),
         ors4   = 0:1
    ),
    .Names    = c("csm_id", "dsp1", "dsp2", "dsp3", "dsp4", "ors1", "ors2", "ors3", "ors4"),
    class     = "data.frame",
    row.names = c(NA, -2L)
)

Here what I am trying to do is, first I find distinct channels and distinct days in my dataset. Now I am concatenating those two objects(distinct channels and days) and then use these as column names of my new dataset.

I wrote a simple code in R. But it is really time consuming. Can anyone help me to do this task.

How to do same operation in python also?

Thanks in advance.

标签： python r reshape2

1条回答

三岁会撩人

2楼-- · 2019-09-10 11:22

Try

 dcast(d1, userid~channel+day, value.var='day', drop=FALSE)
 #        userid dsp_1 dsp_2 dsp_3 dsp_4 osr_1 osr_2 osr_3 osr_4
 #1 64455670203     2     1     1     0     0     0     1     0
 #2 64455670204     0     1     0     1     0     1     1     1

0人赞添加讨论(0) 举报

How to use dcast function to transform my dataset

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间