I want to calculate
"average of the closing prices for the 5,10,30 consecutive trading days immediately preceding and including the Announcement Day, but excluding trading halt days (days on which trading volume is 0 or NA)
For example, now we set 2014/5/7 is the Announcement day.
then average of price for 5 consecutive days :
average of (price of 2014/5/7,2014/5/5, 2014/5/2, 2014/4/30,2014/4/29),
price of 2014/5/6 and 2014/5/1 was excluded due to 0 trading volume on those days.
EDIT on 11/9/2014
One thing to Note: the announcement day for each stock is different, and it's not last valid date in the data, so usage of tail
when calculating average was not appropriate.
Date Price Volume 2014/5/9 1.42 668000 2014/5/8 1.4 2972000 2014/5/7 1.5 1180000 2014/5/6 1.59 0 2014/5/5 1.59 752000 2014/5/2 1.6 138000 2014/5/1 1.6 NA 2014/4/30 1.6 656000 2014/4/29 1.61 364000 2014/4/28 1.61 1786000 2014/4/25 1.64 1734000 2014/4/24 1.68 1130000 2014/4/23 1.68 506000 2014/4/22 1.67 354000 2014/4/21 1.7 0 2014/4/18 1.7 0 2014/4/17 1.7 1954000 2014/4/16 1.65 1788000 2014/4/15 1.71 1294000 2014/4/14 1.68 1462000
Reproducible Code:
require(quantmod)
require(data.table)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
date_end <- as.Date("2014-09-09")
# retrive data of all stocks
prices <- getSymbols(tickers, from = date_begin, to = date_end, auto.assign = TRUE)
dataset <- merge(Cl(get(prices[1])),Vo(get(prices[1])))
for (i in 2:length(prices)){
dataset <- merge(dataset, Cl(get(prices[i])),Vo(get(prices[i])))
}
# Write First
write.zoo(dataset, file = "prices.csv", sep = ",", qmethod = "double")
# Read zoo
test <- fread("prices.csv")
setnames(test, "Index", "Date")
Then I got a data.table. The first Column is Date, then the price and volume for each stock.
Actually, the original data contains information for about 40 stocks. Column names have the same patter: "X" + ticker.close , "X" + ticker.volumn
Last trading days for different stock were different.
The desired output :
days 0007.HK 1036.HK 5 1.1 1.1 10 1.1 1.1 30 1.1 1.1
The major issues:
.SD and lapply and .SDCol can be used for looping different stocks. .N can be used when calculating last consecutive N days.
Due to the different announcement day, it becomes a little complicated.
Any suggestions on single stock using quantmod or multiple stocks using data.table are extremely welcomed!
Thanks GSee and pbible for the nice solutions, it was very useful. I'll update my code later incorporating different announcement day for each stocks, and consult you later.
Indeed, it's more a xts question than a data.table one. Anything about data.table will be very helpful. Thanks a lot!
Because the different stocks have different announcement days, I tried to make a solution first following @pbible's logic, any suggestions will be extremely welcomed.
library(quantmod)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
# Instead of making one specific date_end, different date_end is used for convenience of the following work.
date_end <- c(as.Date("2014-07-08"),as.Date("2014-05-15"))
for ( i in 1: length(date_end)) {
stocks <- getSymbols(tickers[i], from = date_begin, to = date_end[i], auto.assign = TRUE)
dataset <- cbind(Cl(get(stocks)),Vo(get(stocks)))
usable <- subset(dataset,dataset[,2] > 0 & !is.na(dataset[,2]))
sma.5 <- SMA(usable[,1],5)
sma.10 <- SMA(usable[,1],10)
sma.30 <- SMA(usable[,1],30)
col <- as.matrix(rbind(tail(sma.5,1), tail(sma.10,1), tail(sma.30,1)))
colnames(col) <- colnames(usable[,1])
rownames(col) <- c("5","10","30")
if (i == 1) {
matrix <- as.matrix(col)
}
else {matrix <- cbind(matrix,col)}
}
I got what I want, but the code is ugly..Any suggestions to make it elegant are extremely welcomed!
How about something like this using the
subset
and moving average (SMA). Here is the solution I put together.The output:
This should work with multiple stocks. It will use only the most recent valid date.
Well, here's a way to do it. I don't know why you want to get rid of the loop, and this does not get rid of it (in fact it has a loop nested inside another). One thing that you were doing is growing objects in memory with each iteration of your loop (i.e. the
matrix <- cbind(matrix,col)
part is inefficient). This Answer avoids that.Some commentary...
myEnv
, to hold your data so that it does not clutter your workspace.getSymbols
(as you did in your attempt) because the input tickers are not uppercase.lapply
loop (wrapped indo.call(cbind, ...)
). I'm looping over the namedend.dates
vector.myEnv
, removes NAs, and subsets it to only include data up to the relevant end date.vapply
loops over a vector of different lookbacks and calculates themean
. That is wrapped insetNames
so that each result is named based on which lookback was used to calculate it.lapply
call returns a list of named vectors.do.call(cbind, LIST)
is the same as callingcbind(LIST[[1]], LIST[[2]], LIST[[3]])
exceptLIST
can be a list of any length.Hope this helps.