This question already has an answer here:
-
Reshape three column data frame to matrix (“long” to “wide” format)
5 answers
I am currently learning to work with data.frame and quite confused on how to reorder them.
At the moment, I have a data.frame that shows :
- column 1: a shop name
- column 2: a product
- column 3: the number of purchase for this product by this shop
or visually something like this:
+---+-----------+-------+----------+--+
| | Shop.Name | Items | Product | |
+---+-----------+-------+----------+--+
| 1 | Shop1 | 2 | Product1 | |
| 2 | Shop1 | 4 | Product2 | |
| 3 | Shop2 | 3 | Product1 | |
| 4 | Shop3 | 2 | Product1 | |
| 5 | Shop3 | 1 | Product4 | |
+---+-----------+-------+----------+--+
What I would like to achieve is the following "shop-centric" structure:
- column 1: a shop name
- column 2: Items sold for product1
- column 3: Items sold for product2
- column 4: Items sold for product3
...
When there is no line for a specific shop/product (because of no sales), I would like to create a 0.
or
+---+-------+-------+-------+-------+-------+-----+--+--+
| | Shop | Prod1 | Prod2 | Prod3 | Prod4 | ... | | |
+---+-------+-------+-------+-------+-------+-----+--+--+
| 1 | Shop1 | 2 | 4 | 0 | 0 | ... | | |
| 2 | Shop2 | 3 | 0 | 0 | 0 | ... | | |
| 3 | Shop3 | 2 | 0 | 0 | 1 | ... | | |
+---+-------+-------+-------+-------+-------+-----+--+--+
The answers so far work to a certain degree, but don't fully answer your question. In particular, they don't address the issue of a case in which there are no shops which sold a particular product. From your example input and desired output, there were no shops which sold "Product3". Indeed, "Product3" does not even appear in your source data.frame
. Additionally, they do not address the possible situation of having more than one row for each Shop + Product combination.
Here's a modified version of your data and the two solutions so far. I've added another row for a combination of "Shop1" and "Product1". Notice that I have converted your products to a factor
variable that includes the levels that the variable can take, even if none of the cases actually has that level.
mydf <- data.frame(
Shop.Name = c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3", "Shop1"),
Items = c(2, 4, 3, 2, 1, 2),
Product = factor(
c("Product1", "Product2", "Product1", "Product1", "Product4", "Product1"),
levels = c("Product1", "Product2", "Product3", "Product4")))
dcast
from "reshape2"
library(reshape2)
dcast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
# Using Product as value column: use value.var to override.
# Aggregation function missing: defaulting to length
# Error in .fun(.value[i], ...) :
# 2 arguments passed to 'length' which requires 1
Wha? Suddenly does not work. Do this instead:
dcast(mydf, formula = Shop.Name ~ Product,
fill = 0, value.var = "Items",
fun.aggregate = sum, drop = FALSE)
# Shop.Name Product1 Product2 Product3 Product4
# 1 Shop1 4 4 0 0
# 2 Shop2 3 0 0 0
# 3 Shop3 2 0 0 1
Let's be oldschool. cast
from "reshape"
library(reshape)
cast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
# Aggregation requires fun.aggregate: length used as default
# Shop.Name Product1 Product2 Product4
# 1 Shop1 2 1 0
# 2 Shop2 1 0 0
# 3 Shop3 1 0 1
Eh. Not what you wanted again... Try this instead:
cast(mydf, formula = Shop.Name ~ Product,
value = "Items", fill = 0,
add.missing = TRUE, fun.aggregate = sum)
# Shop.Name Product1 Product2 Product3 Product4
# 1 Shop1 4 4 0 0
# 2 Shop2 3 0 0 0
# 3 Shop3 2 0 0 1
Let's get back to basics. xtabs
from base R
xtabs(Items ~ Shop.Name + Product, mydf)
# Product
# Shop.Name Product1 Product2 Product3 Product4
# Shop1 4 4 0 0
# Shop2 3 0 0 0
# Shop3 2 0 0 1
Or, if you prefer a data.frame
(note that your "Shop.Name" variable has been converted to the row.names
of the data.frame
):
as.data.frame.matrix(xtabs(Items ~ Shop.Name + Product, mydf))
# Product1 Product2 Product3 Product4
# Shop1 4 4 0 0
# Shop2 3 0 0 0
# Shop3 2 0 0 1
Use dcast
from the reshape2
library:
library(reshape2)
> df <- data.frame(Shop.Name=rep(c("Shop1","Shop2","Shop3"),each=3),
+ Items=rpois(9,5),
+ Product=c(rep(c("Prod1","Prod2","Prod3","Prod4"),2),"Prod5")
+ )
> df
Shop.Name Items Product
1 Shop1 6 Prod1
2 Shop1 5 Prod2
3 Shop1 6 Prod3
4 Shop2 5 Prod4
5 Shop2 6 Prod1
6 Shop2 6 Prod2
7 Shop3 4 Prod3
8 Shop3 7 Prod4
9 Shop3 5 Prod5
> dcast(df,Shop.Name ~ Product,value.var="Items",fill=0)
Shop.Name Prod1 Prod2 Prod3 Prod4 Prod5
1 Shop1 6 5 6 0 0
2 Shop2 6 6 0 5 0
3 Shop3 0 0 4 7 5
If you want to use the original reshape package for any reason:
Shop.Name <- c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3")
Items <- c(2,4,3,2,1)
Product <- c("Product1", "Product2", "Product1", "Product1", "Product4")
(df <- data.frame(Shop.Name, Items, Product))
cast(df, formula = Shop.Name ~ Product, value="Items", fill=0)