How do I select the first row of an R data frame that meets certain criteria?
Here is the context:
I have a data frame with five columns:
"pixel", "year","propvar", "component", "cumsum."
There are 1,225 combinations of pixel
and year
, because the data was computed from the annual time series of 49 geographic pixels for each of 25 study years. Within each pixel-year, I have computed propvar
, the proportion of total variance explained by a given component of the fast Fourier transform for the time series of a given pixel-year. I then computed cumsum
, which is the cumulative sum of propvar
for each frequency component within a pixel-year. The component
column just gives you an index for the Fourier series component (plus 1) from which propvar
was calculated.
I want to determine the number of components required to explain greater than 99% of the variance. I figure one way to do this is to find the first row within each pixel-year where cumsum
> 0.99, and create a data frame from it with three columns, pixel
, year
, and numbercomps
, where numbercomps
is the number of components required within a given pixel-year to explain greater than 99% of the variance. I do not know how to do this in R. Does anyone have a solution?
Sure. Something like this should do the trick:
EDIT Also, for those interested in
data.table
, there is this: