How do I select the first row in an R data frame t

2020-05-26 19:55发布

How do I select the first row of an R data frame that meets certain criteria?

Here is the context:

I have a data frame with five columns:

"pixel", "year","propvar", "component", "cumsum." 

There are 1,225 combinations of pixel and year, because the data was computed from the annual time series of 49 geographic pixels for each of 25 study years. Within each pixel-year, I have computed propvar, the proportion of total variance explained by a given component of the fast Fourier transform for the time series of a given pixel-year. I then computed cumsum, which is the cumulative sum of propvar for each frequency component within a pixel-year. The component column just gives you an index for the Fourier series component (plus 1) from which propvar was calculated.

I want to determine the number of components required to explain greater than 99% of the variance. I figure one way to do this is to find the first row within each pixel-year where cumsum > 0.99, and create a data frame from it with three columns, pixel, year, and numbercomps, where numbercomps is the number of components required within a given pixel-year to explain greater than 99% of the variance. I do not know how to do this in R. Does anyone have a solution?

1条回答
手持菜刀,她持情操
2楼-- · 2020-05-26 20:48

Sure. Something like this should do the trick:

# CREATE A REPRODUCIBLE EXAMPLE!
df <- data.frame(year = c("2001", "2003", "2001", "2003", "2003"),
                 pixel = c("a", "b", "a", "b", "a"), 
                 cumsum = c(99, 99, 98, 99, 99),
                 numbercomps=1:5)
df
#   year pixel cumsum numbercomps
# 1 2001     a     99           1
# 2 2003     b     99           2 
# 3 2001     a     98           3
# 4 2003     b     99           4
# 5 2003     a     99           5

# EXTRACT THE SUBSET YOU'D LIKE.
res <- subset(df, cumsum>=99)
res <- subset(res, 
              subset = !duplicated(res[c("year", "pixel")]),
              select = c("pixel", "year", "numbercomps"))
#   pixel year numbercomps
# 1     a 2001           1
# 2     b 2003           2
# 5     a 2003           5

EDIT Also, for those interested in data.table, there is this:

library(data.table)
dt <- data.table(df, key="pixel, year")    
dt[cumsum>=99, .SD[1], by=key(dt)]
查看更多
登录 后发表回答