Rowwise operation within for loop using dplyr

2020-08-01 06:35发布

问题:

I have some transport data which I would like to perform a rowwise if comparison within a for loop. The data looks something like this.

# Using the iris dataset 
> iris <- as.data.frame(iris)
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Where the result would record the instances of sepal lengths with equal petal width in each species. Such that we record the pairs of sepal lengths with equal petal width (this is only an illustration having no scientific significance). Which would yield something like this:

Species Petal.Width Sepal.Length1 Sepal.Length2
setosa          0.2         5.1             4.9
setosa          0.2         5.1             4.7
setosa          0.2         4.9             4.7
setosa          0.2         5.1             4.6
...

My initial Python-ish thought was to perform a for loop within a for loop, looking something like this:

for s in unique(Species):
  for i in 1:nrow(iris):
    for j in 1:nrow(iris):
      if iris$Petal.Width[i,] == iris$Petal.Width[j,]:
        Output$Species = iris$Species[i,]
        Output$Petal.Width = iris$Petal.Width[i,]
        Output$Sepal.Length1= iris$Sepal.Length[i,]
        Output$Sepal.Length2= iris$Sepal.Length[j,]
    end
  end
end

I had thought about using group_by to classify Species first to achieve the first for loop for s in unique(Species):. But I don't know how to rowwise compare each observation in the dataset, and to store it like the second block of code. I have seen questions on for loops in dplyr and rowwise quantities. My apologies if the code above is not as clear. First time asking a question here.

回答1:

Using dplyr:

library(dplyr)    

iris %>%
      group_by(Species,Petal.Width) %>%
      mutate(n = n()) %>%
      filter(n > 1) %>%
      mutate(Sepal.Length1 = Sepal.Length,
             Sepal.Length2 = Sepal.Length1 - Petal.Width) %>%
      arrange(Petal.Width) %>%
      select(Species, Petal.Width, Sepal.Length1, Sepal.Length2)

This is grouping Species and Petal.Width, counting instances where they are the same, only selecting cases where there are more than 1 unique pairing, and then renaming Sepal.Length to Sepal.Length1, and creating a new variable Sepal.Length2 = Sepal.Length1 - Petal.Width

For recording Sepal.Length for each Species within a defined range:

minpw <- min(Petal.Width)
maxpw <- max(Petal.Width)

iris %>%
  group_by(Sepal.Length, Species, petal_width_range = cut(Petal.Width, breaks = seq(minpw,maxpw,by=0.2))) %>%
  summarise(count = n())