loop to multiply across columns

2019-07-13 20:41发布

问题:

I have a data frame with columns labeled sales1, sales2, price1, price2 and I want to calculate revenues by multiplying sales1 * price1 and so-on across each number in an iterative fashion.

data <- data_frame(
  "sales1" = c(1, 2, 3),
  "sales2" = c(2, 3, 4),
  "price1" = c(3, 2, 2),
  "price2" = c(3, 3, 5))

data
# A tibble: 3 x 4
#  sales1 sales2 price1 price2
#   <dbl>  <dbl>  <dbl>  <dbl>
#1      1      2      3      3
#2      2      3      2      3
#3      3      4      2      5

Why doesn't the following code work?

data %>%
  mutate (
    for (i in seq_along(1:2)) {
      paste0("revenue",i) = paste0("sales",i) * paste0("price",i)
    }
  )

回答1:

Assuming your columns are already ordered (sales1, sales2, price1, price2). We can split the dataframe in two parts and then multiply them

data[grep("sales", names(data))] * data[grep("price", names(data))]

#  sales1 sales2
#1      3      6
#2      4      9
#3      6     20

If the columns are not already sorted according to their names, we can sort them by using order and then use above command.

data <- data[order(names(data))]


回答2:

This answer is not brief. For that, @RonakShah's existing answer is the one to look at!

My response is intended to address a broader concern regarding the difficulty of trying to do this in the tidyverse. My understanding is this is difficult because the data is not currently in a "tidy" format. Instead, you can create a tidy data frame like so:

library(tidyverse)

tidy_df <- data %>% 
  rownames_to_column() %>%
  gather(key, value, -rowname) %>% 
  extract(key, c("variable", "id"), "([a-z]+)([0-9]+)") %>%
  spread(variable, value)

Which then makes the final calculation straightforward

tidy_df %>% mutate(revenue = sales * price)

#> # A tibble: 6 x 5
#>   rowname    id price sales revenue
#>     <chr> <chr> <dbl> <dbl>   <dbl>
#> 1       1     1     3     1       3
#> 2       1     2     3     2       6
#> 3       2     1     2     2       4
#> 4       2     2     3     3       9
#> 5       3     1     2     3       6
#> 6       3     2     5     4      20

If you need to get the data back into the original format you can although this feels clunky to me (I'm sure this can be improved in someway).

tidy_df %>% mutate(revenue = sales * price) %>%
  gather(key, value, -c(rowname, id)) %>%
  unite(key, key, id, sep = "") %>%
  spread(key, value) %>% 
  select(starts_with("price"), 
         starts_with("sales"),
         starts_with("revenue"))

#> # A tibble: 3 x 6
#>   price1 price2 sales1 sales2 revenue1 revenue2
#> *  <dbl>  <dbl>  <dbl>  <dbl>    <dbl>    <dbl>
#> 1      3      3      1      2        3        6
#> 2      2      3      2      3        4        9
#> 3      2      5      3      4        6       20