Reframing magic on data.frame [duplicate]

2019-02-05 09:26发布

This question already has an answer here:

I am currently learning to work with data.frame and quite confused on how to reorder them.

At the moment, I have a data.frame that shows :

  • column 1: a shop name
  • column 2: a product
  • column 3: the number of purchase for this product by this shop

or visually something like this:

+---+-----------+-------+----------+--+
|   | Shop.Name | Items | Product  |  |
+---+-----------+-------+----------+--+
| 1 | Shop1     |     2 | Product1 |  |
| 2 | Shop1     |     4 | Product2 |  |
| 3 | Shop2     |     3 | Product1 |  |
| 4 | Shop3     |     2 | Product1 |  |
| 5 | Shop3     |     1 | Product4 |  |
+---+-----------+-------+----------+--+

What I would like to achieve is the following "shop-centric" structure:

  • column 1: a shop name
  • column 2: Items sold for product1
  • column 3: Items sold for product2
  • column 4: Items sold for product3 ...

When there is no line for a specific shop/product (because of no sales), I would like to create a 0.

or

+---+-------+-------+-------+-------+-------+-----+--+--+
|   | Shop  | Prod1 | Prod2 | Prod3 | Prod4 | ... |  |  |
+---+-------+-------+-------+-------+-------+-----+--+--+
| 1 | Shop1 |     2 |     4 |     0 |     0 | ... |  |  |
| 2 | Shop2 |     3 |     0 |     0 |     0 | ... |  |  |
| 3 | Shop3 |     2 |     0 |     0 |     1 | ... |  |  |
+---+-------+-------+-------+-------+-------+-----+--+--+

3条回答
放荡不羁爱自由
2楼-- · 2019-02-05 09:49

Use dcast from the reshape2 library:

library(reshape2)

> df <- data.frame(Shop.Name=rep(c("Shop1","Shop2","Shop3"),each=3),
+                  Items=rpois(9,5),
+                  Product=c(rep(c("Prod1","Prod2","Prod3","Prod4"),2),"Prod5")
+ )
> df
  Shop.Name Items Product
1     Shop1     6   Prod1
2     Shop1     5   Prod2
3     Shop1     6   Prod3
4     Shop2     5   Prod4
5     Shop2     6   Prod1
6     Shop2     6   Prod2
7     Shop3     4   Prod3
8     Shop3     7   Prod4
9     Shop3     5   Prod5
> dcast(df,Shop.Name ~ Product,value.var="Items",fill=0)
  Shop.Name Prod1 Prod2 Prod3 Prod4 Prod5
1     Shop1     6     5     6     0     0
2     Shop2     6     6     0     5     0
3     Shop3     0     0     4     7     5
查看更多
Melony?
3楼-- · 2019-02-05 09:57

If you want to use the original reshape package for any reason:

Shop.Name <- c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3")
Items <- c(2,4,3,2,1)
Product <- c("Product1", "Product2", "Product1", "Product1", "Product4")
(df <- data.frame(Shop.Name, Items, Product))

cast(df, formula = Shop.Name ~ Product, value="Items", fill=0)
查看更多
Rolldiameter
4楼-- · 2019-02-05 10:01

The answers so far work to a certain degree, but don't fully answer your question. In particular, they don't address the issue of a case in which there are no shops which sold a particular product. From your example input and desired output, there were no shops which sold "Product3". Indeed, "Product3" does not even appear in your source data.frame. Additionally, they do not address the possible situation of having more than one row for each Shop + Product combination.

Here's a modified version of your data and the two solutions so far. I've added another row for a combination of "Shop1" and "Product1". Notice that I have converted your products to a factor variable that includes the levels that the variable can take, even if none of the cases actually has that level.

mydf <- data.frame(
  Shop.Name = c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3", "Shop1"),
  Items = c(2, 4, 3, 2, 1, 2),
  Product = factor(
    c("Product1", "Product2", "Product1", "Product1", "Product4", "Product1"),
    levels = c("Product1", "Product2", "Product3", "Product4")))
  1. dcast from "reshape2"

    library(reshape2)
    dcast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
    # Using Product as value column: use value.var to override.
    # Aggregation function missing: defaulting to length
    # Error in .fun(.value[i], ...) : 
    #   2 arguments passed to 'length' which requires 1
    

    Wha? Suddenly does not work. Do this instead:

    dcast(mydf, formula = Shop.Name ~ Product, 
          fill = 0, value.var = "Items", 
          fun.aggregate = sum, drop = FALSE)
    #   Shop.Name Product1 Product2 Product3 Product4
    # 1     Shop1        4        4        0        0
    # 2     Shop2        3        0        0        0
    # 3     Shop3        2        0        0        1
    
  2. Let's be oldschool. cast from "reshape"

    library(reshape)
    cast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
    # Aggregation requires fun.aggregate: length used as default
    #   Shop.Name Product1 Product2 Product4
    # 1     Shop1        2        1        0
    # 2     Shop2        1        0        0
    # 3     Shop3        1        0        1
    

    Eh. Not what you wanted again... Try this instead:

    cast(mydf, formula = Shop.Name ~ Product, 
         value = "Items", fill = 0, 
         add.missing = TRUE, fun.aggregate = sum)
    #   Shop.Name Product1 Product2 Product3 Product4
    # 1     Shop1        4        4        0        0
    # 2     Shop2        3        0        0        0
    # 3     Shop3        2        0        0        1
    
  3. Let's get back to basics. xtabs from base R

    xtabs(Items ~ Shop.Name + Product, mydf)
    #          Product
    # Shop.Name Product1 Product2 Product3 Product4
    #     Shop1        4        4        0        0
    #     Shop2        3        0        0        0
    #     Shop3        2        0        0        1
    

    Or, if you prefer a data.frame (note that your "Shop.Name" variable has been converted to the row.names of the data.frame):

    as.data.frame.matrix(xtabs(Items ~ Shop.Name + Product, mydf))
    #       Product1 Product2 Product3 Product4
    # Shop1        4        4        0        0
    # Shop2        3        0        0        0
    # Shop3        2        0        0        1
    
查看更多
登录 后发表回答