I am working with GPS data and trying to figure out how to average the 11th-15th fixes for latitude and longitude. I have seen solutions in similar questions for how to average every n rows. The problem is that occasionally the satellites bomb out and the fixes stop at 13 or 14. So, in these cases, I only want to average 3 or 4 values instead of 5. So I am looking to average values for latitude and longitude starting from where the number in series is 11 until the number in series drops again (or as long as it is increasing? I need it to include the last set, which will not drop back down to a low number again). I have started by deleting all rows where the number in series is NOT in my desired 11-15 ranges. So,for an example dummy data set, this leaves me with:
Date Time Long Lat NoInSeries
12 17/11/2014 22:09:17 115.9508 -31.82850 11
13 17/11/2014 22:09:18 115.9508 -31.82846 12
14 17/11/2014 22:09:19 115.9513 -31.82864 13
15 17/11/2014 22:09:21 115.9511 -31.82863 14
26 18/11/2014 00:07:14 115.9509 -31.82829 11
27 18/11/2014 00:07:15 115.9509 -31.82829 12
28 18/11/2014 00:07:16 115.9509 -31.82830 13
29 18/11/2014 00:07:17 115.9509 -31.82830 14
30 18/11/2014 00:07:18 115.9509 -31.82831 15
56 18/11/2014 10:00:24 115.9513 -31.82670 11
57 18/11/2014 10:00:25 115.9514 -31.82670 12
58 18/11/2014 10:00:26 115.9514 -31.82669 13
59 18/11/2014 10:00:27 115.9514 -31.82668 14
60 18/11/2014 10:00:28 115.9514 -31.82668 15
My desired output would be something like this, with the first one averaging 4 (11-14) and the next two averaging 5 (11-15):
Date Time Long Lat NoInSeries AvgLong Avg Lat
12 17/11/2014 22:09:17 115.9508 -31.82850 11 115.9510 -31.82856
13 17/11/2014 22:09:18 115.9508 -31.82846 12 NA NA
14 17/11/2014 22:09:19 115.9513 -31.82864 13 NA NA
15 17/11/2014 22:09:21 115.9511 -31.82863 14 NA NA
26 18/11/2014 00:07:14 115.9509 -31.82829 11 115.9509 -31.82830
27 18/11/2014 00:07:15 115.9509 -31.82829 12 NA NA
28 18/11/2014 00:07:16 115.9509 -31.82830 13 NA NA
29 18/11/2014 00:07:17 115.9509 -31.82830 14 NA NA
30 18/11/2014 00:07:18 115.9509 -31.82831 15 NA NA
56 18/11/2014 10:00:24 115.9513 -31.82670 11 115.9514 -31.82669
57 18/11/2014 10:00:25 115.9514 -31.82670 12 NA NA
58 18/11/2014 10:00:26 115.9514 -31.82669 13 NA NA
59 18/11/2014 10:00:27 115.9514 -31.82668 14 NA NA
60 18/11/2014 10:00:28 115.9514 -31.82668 15 NA NA
I would then go through and delete all rows where AvgLong==NA, so my final output would just have all the rows where number in series=11 with the averages.
I really don't know where to start with the code for this...the examples I have found have all discussed averaging an exact number of rows, rather than a variable number.
For instance:
c( tapply( x, (row(x)-1)%/%5, mean ) )
Or:
idx <- ceiling(seq_len(nrow(dd)) / 5)
# do colMeans on all columns except last one.
res <- lapply(split(dd[-(ncol(dd))], idx), colMeans, na.rm = TRUE)
# assign first value of "datetime" in each 5-er group as names to list
names(res) <- dd$datetime[seq(1, nrow(df), by=5)]
# bind them to give a matrix
res <- do.call(rbind, res)
Also, the answers I have seen generally then output the averages as a new data frame... Ultimately, I also want to have this averaging under a condition: if schedule is 'Multifix', I want to average 11 to however high it goes up to 15, whereas if schedule is 'Continuous', I want to average from 181 up until however high each one goes...). Something like this:
if(import.list$Schedule=='Multifix'){
...code to average Long and Lat for Number in Series from 11 up to however high it goes (up to 15)...
} else {
...code to average Long and Lat for Number in Series from 241 up to however high it goes...
}
Or perhaps I have an if else statement to define a variable and then use that variable in the function to do the averaging?
...but I imagine this condition could complicate things if the output creates a new dataframe, which is why I was aiming for just adding values to new columns "AvgLong" and "AvgLat." Thanks for any help!!
#dput function shows the data I was working from your question.
#get.counter gets the row index when the value of the column starts to decrease as opposed to ascending order.
# avg.seg.col function outputs a data frame with the segmented average of a column. df1 is the input data frame, colvar is the column name (Eg: Long or Lat), and get_counter is the output from get.counter function.
# read in data from a text file using read.table function. You need to make sure your file exists in current working directory. Working directory can be set by setwd("path of current working directory")
# apply get.counter function with a vector from df1$NoInSeries
# Apply avg.seg.col function for Long column
# Apply avg.seg.col function for Lat column
# Merge data frames by column
# Assign column names
Output:
#after removing rows with NA, the output looks like below
Seems that using
aggregate
does most of the work:Needed to shift the ID variable by one to get the modulo divison to deliver the groups you wanted. If you wanted to have something aligned with the origianl data, then the
ave
function is designed to deliver:I've read
for
loops are necessary for iterative actions, which is why I like Chinmay's use ofcumsum
anddiff
. I don't have enough reputation to comment on @Chinmay Patil's elegant answer, so here is a slightly different approach.You can report the Lat/Lon by the first time (
min
, as above), last time (max
), or average time (mean
). However, sometimes I have issues withddply
when I have POSIXct dates/times in the data frame.You can do it using
cumsum
,diff
,aggregate
andmerge
cumsum(c(0, diff(x$NoInSeries) < 0))
will give you a new column which increments every timediff
ofNoInSeries
is negative.Now you
aggregate
using the newSeriesNo
column