I am trying to solve this issue for past 3 months. Please help.
I have tick data (Price and Volume) for many stocks belonging to a single exchange. Each stock has its own .rds
file on the hard disk. I am interested in cleaning it up:
merge multiple same time stamps by taking median
subset data for exchange hours only
aggregate it over 20 minutes by previous tick aggregation
I know that the
function aggregatets in highfrequency package
can perform the previous tick aggregation operation. However, the function takes one stock one day data only.
To demonstrate the problem I am using raw tick data (named trade) for a single stock.
dput(head(trade,50))
structure(c(54.7, 54.7, 54.5, 54.5, 54.5, 54.6, 54.6, 54.65,
54.65, 54.6, 54.65, 54.65, 54.65, 54.65, 54.7, 54.7, 54.8, 54.8,
54.85, 54.85, 54.85, 54.85, 54.8, 54.8, 54.8, 54.8, 54.65, 54.65,
54.8, 54.8, 54.8, 54.8, 54.65, 54.65, 54.65, 54.75, 54.65, 54.7,
54.7, 54.7, 54.75, 54.75, 54.75, 54.75, 54.75, 54.7, 54.7, 54.7,
54.65, 54.65, 8, 542, 110, 600, 88, 200, 150, 100, 700, 250,
75, 100, 25, 200, 100, 600, 1546, 940, 100, 6250, 89, 6911, 89,
211, 100, 50, 1410, 1090, 913, 4737, 50, 300, 2486, 400, 25,
85, 250, 168, 50, 100, 40, 40, 60, 50, 40, 10, 91, 6072, 229,
1000), class = c("xts", "zoo"), .indexCLASS = c("POSIXct", "POSIXt"
), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459481853,
1459481853, 1459482302, 1459482302, 1459482305, 1459482306, 1459482306,
1459482307, 1459482307, 1459482308, 1459482312, 1459482314, 1459482314,
1459482315, 1459482317, 1459482317, 1459482318, 1459482318, 1459482319,
1459482319, 1459482320, 1459482320, 1459482322, 1459482322, 1459482330,
1459482330, 1459482331, 1459482331, 1459482336, 1459482336, 1459482337,
1459482337, 1459482338, 1459482338, 1459482339, 1459482340, 1459482344,
1459482348, 1459482351, 1459482351, 1459482356, 1459482357, 1459482357,
1459482361, 1459482362, 1459482364, 1459482367, 1459482367, 1459482369,
1459482369), tzone = "Asia/Calcutta", tclass = c("POSIXct", "POSIXt"
)), .Dim = c(50L, 2L), .Dimnames = list(NULL, c("value", "size"
)))
I use the following code to do previous tick aggregation to 20 minute intervals:
require(xts)
require(highfrequency)
trade<-xts(trade[,-1], order.by = trade[,1])
trade2<-do.call(rbind, lapply(split(trade,"days"), mergeTradesSameTimestamp))
colnames(trade)[c(1,2)]<-c("PRICE", "SIZE")
trade2<-trade2["T09:30:00/T15:30:00"]
trade2<-trade2[,1]
fundo=function(x) aggregatets(FUN = previoustick,on="minutes",k=20, dropna =F)
As aggregatets() only takes data for 1 day I am splitting trade2 into days and apply it on them
trade3<-do.call(rbind, lapply(split(trade2, "days"), fundo))
But I get the error for function aggregatets
:
trade3<-do.call(rbind, lapply(split(trade2, "days"), fundo))
Error in FUN != "previoustick" :
comparison (2) is possible only for atomic and list types
Called from: aggregatets(FUN = previoustick, on = "minutes", k = 20, dropna = F)
Please suggest how to solve this error.
This code works, based on the limited data you provided. Your error was from not passing though an object to argument
ts
. (Also in your sample data, none of the ticks happened before 9:30am, so for reproducibility of this answer I changed it to 8.30am. i.e.trade2<-trade2["T08:30:00/T15:30:00"]
):