I have two matrices of normalized read counts for control and treatment in a time series day1 to day26. I want to calculate distance matrix by Dynamic Time Wrapping afterward use that for clustering but seems too complicated. I did so; who can help for more clarification please? Thanks a lot
> head(control[,1:4])
MAST2 WWC2 PHYHIPL R3HDM2
Control_D1 6.591024 5.695156 3.388652 5.756384
Control_D1 8.043454 5.365221 6.859768 6.936970
Control_D3 7.731590 4.868267 6.919972 6.931073
Control_D4 8.129948 5.105528 6.627016 7.090268
Control_D5 7.690863 4.729501 6.824746 6.904610
Control_D6 8.101723 5.334501 6.868990 7.115883
>
> head(lead[,1:4])
MAST2 WWC2 PHYHIPL R3HDM2
Lead30_D1 6.418423 5.610699 3.734425 5.778046
Lead30_D2 7.918360 4.295191 6.559294 6.780952
Lead30_D3 7.807142 4.294722 6.599187 6.716040
Lead30_D4 7.856720 4.432136 6.572337 6.848483
Lead30_D5 7.827311 4.204738 6.607107 6.784094
Lead30_D6 7.848760 4.458451 6.581216 6.943003
>
> dim(control)
[1] 26 2603
> dim(lead)
[1] 26 2603
library(dtw)
for (i in control) {
for (j in lead) {
result[i,j] <- dtw( dist(control[,,i],lead[,,j]), distance.only=T )$normalizedDistance
}
}
Says that
Error in lead[, , j] : incorrect number of dimensions
There have already been questions similar to yours, but the answers haven't been too detailed. Here's a breakdown of what you need to know, in the specific case of R.
Calculating cross-distance matrices
The
proxy
package is made specifically for the calculation of cross-distance matrices. You should check its vignette to know which measures are already implemented by it. An example of its use:Note: in the context of time series,
proxy
treats each row in a matrix as a series, which can be confirmed by the fact thatsample_data
above is a5x10
matrix and the resulting cross-distance matrix is5x5
.Using the DTW distance
The
dtw
package implements many variations of DTW, and it also leveragesproxy
. You could calculate a DTW distance matrix with:Using custom distances
One nice thing about
proxy
is that it gives you the option to register custom functions. You seem to be interested in the normalized version of DTW, so you could do something like this:See the documentation of
pr_DB
for more information.Other DTW implementations
The
dtwclust
package (which I made) implements a basic but faster version of DTW which can use multi-threading and also leveragesproxy
:The
dtw_basic
implementation only supports two step patterns and one window type, but it is considerably faster:Another multi-threaded implementation is included in the
parallelDist
package, although I haven't personally tested it.Multivariate or multi-dimensional time series
A single multivariate series is commonly a matrix where time spans the rows and the multiple variables span the columns. DTW also works for them:
The nice thing about
proxy
is that it can calculate distances between objects contained in lists too, so you can put several multivariate series in lists of matrices:Your case
Regardless of what you choose, you can probably use
proxy
to get your result, but since you haven't provided your whole data, I can't give you a more specific example. I presume thatdtwclust::dtw_basic(control[, 1:4], lead[, 1:4], normalize = TRUE)
would give you the distance between one pair of series, assuming you're treating each one as a multivariate series with 4 variables.If your question is "why am I getting this error?" the answer is that you're trying to subset a matrix, which is a two dimensional array, according to a 3rd dimension.
see:
Hopefully you can see now that you have a few problems:
i
andj
values are the values incontrol
andlead
respectively. You can use them as their values, or you can generate the index, e.g.,for(i in seq_along(control)
if you're planning to use it for something other than getting that same value out.dist
function.dist
takes a single matrix and computes the distance between its rows. You seem to be trying to pass it two values from two different matrices, or perhaps two subsets of two different matrices. It looks like you might need to go back and look at the examples in the documentation forxtr