R programming help in editing code

2019-08-16 17:41发布

问题:

I've asked many questions about this and all the answers were really helpful...but once again my data is weird and I need help...Basically, what I want to do is find the average speed at a certain range of intervals...lets say from 6 s to 40 s my average speed would be 5 m/s...etc etc.. So it was pointed out to me to use this code...

library(IRanges)
idx <- seq(1, ncol(data), by=2)
# idx is now 1, 3, 5. It will be passed one value at a time to `i`.
# that is, `i` will take values 1 first, then 3 and then 5 and each time
# the code within is executed.
o <- lapply(idx, function(i) {  
    ir1 <- IRanges(start=seq(0, max(data[[i]]), by=401), width=401)
    ir2 <- IRanges(start=data[[i]], width=1)
    t <- findOverlaps(ir1, ir2)
    d <- data.frame(mean=tapply(data[[i+1]], queryHits(t), mean))
    cbind(as.data.frame(ir1), d)
})

which gives this output

# > o
# [[1]]
#   start end width mean
# 1     0 400   401 1.05
# 
# [[2]]
#   start end width mean
# 1     0 400   401  1.1
# 
# [[3]]
#   start end width     mean
# 1     0 400   401 1.383333

So if I wanted it to be every 100 s... I'll just change ir1 <- ....., by = 401 to become by=100.

But my data is weird because of a few things

  1. my data doesnt always start with 0 s sometimes it starts at 20 s...depending on the specimen and whether it moves
  2. My data collection does not happen every 1s or 2s or 3s. Hence sometimes I get data 1-20 s but it skips over 20-40 s simply because the specimen does not move.
  3. I think the findOverlaps portion of the code affects my output. How can I get rid of that without disturbing the output?

Here is some data to illustrate my troubles...but all of my real data ends in 2000s

Time    Speed   Time    Speed   Time    Speed
6.3 1.6 3.1 1.7 0.3 2.4
11.3    1.3 5.1 2.2 1.3 1.3
13.8    1.3 6.3 3.4 3.1 1.5
14.1    1.0 7.0 2.3 4.5 2.7
47.4    2.9 11.3    1.2 5.1 0.5
49.2    0.7 26.5    3.3 5.9 1.7
50.5    0.9 27.3    3.4 9.7 2.4
57.1    1.3 36.6    2.5 11.8    1.3
72.9    2.9 40.3    1.1 13.1    1.0
86.6    2.4 44.3    3.2 13.8    0.6
88.5    3.4 50.9    2.6 14.0    2.4
89.0    3.0 62.6    1.5 14.8    2.2
94.8    2.9 66.8    0.5 15.5    2.6
117.4   0.5 67.3    1.1 16.4    3.2
123.7   3.2 67.7    0.6 26.5    0.9
124.5   1.0 68.2    3.2 44.7    3.0
126.1   2.8 72.1    2.2 45.1    0.8

As you can see from the data, it doesnt necessarily end in 60 s etc sometimes it only ends at 57 etc

EDIT add dput of data

structure(list(Time = c(6.3, 11.3, 13.8, 14.1, 47.4, 49.2, 50.5, 
57.1, 72.9, 86.6, 88.5, 89, 94.8, 117.4, 123.7, 124.5, 126.1), 
    Speed = c(1.6, 1.3, 1.3, 1, 2.9, 0.7, 0.9, 1.3, 2.9, 2.4, 
    3.4, 3, 2.9, 0.5, 3.2, 1, 2.8), Time.1 = c(3.1, 5.1, 6.3, 
    7, 11.3, 26.5, 27.3, 36.6, 40.3, 44.3, 50.9, 62.6, 66.8, 
    67.3, 67.7, 68.2, 72.1), Speed.1 = c(1.7, 2.2, 3.4, 2.3, 
    1.2, 3.3, 3.4, 2.5, 1.1, 3.2, 2.6, 1.5, 0.5, 1.1, 0.6, 3.2, 
    2.2), Time.2 = c(0.3, 1.3, 3.1, 4.5, 5.1, 5.9, 9.7, 11.8, 
    13.1, 13.8, 14, 14.8, 15.5, 16.4, 26.5, 44.7, 45.1), Speed.2 = c(2.4, 
    1.3, 1.5, 2.7, 0.5, 1.7, 2.4, 1.3, 1, 0.6, 2.4, 2.2, 2.6, 
    3.2, 0.9, 3, 0.8)), .Names = c("Time", "Speed", "Time.1", 
"Speed.1", "Time.2", "Speed.2"), class = "data.frame", row.names = c(NA, 
-17L))

回答1:

sorry if i don't understand your question entirely, could you explain why this example doesn't do what you're trying to do?

# use a pre-loaded data set
mtcars

# choose which variable to cut
var <- 'mpg'

# define groups, whether that be time or something else
# and choose how to cut it.
x <- cut( mtcars[ , var ] , c( -Inf , seq( 15 , 25 , by = 2.5 ) , Inf ) )

# look at your cut points, for every record
x 

# you can merge them back on to the mtcars data frame if you like..
mtcars$cutpoints <- x
# ..but that's not necessary

# find the mean within those groups
tapply( 
    mtcars[ , var ] , 
    x ,
    mean
)


# find the mean within groups, using a different variable
tapply( 
    mtcars[ , 'wt' ] , 
    x ,
    mean
)