Calculate the average based on other columns

2019-06-02 09:46发布

I want to calculate

"average of the closing prices for the 5,10,30 consecutive trading days immediately preceding and including the Announcement Day, but excluding trading halt days (days on which trading volume is 0 or NA)

For example, now we set 2014/5/7 is the Announcement day.

then average of price for 5 consecutive days :

average of (price of 2014/5/7,2014/5/5, 2014/5/2, 2014/4/30,2014/4/29),

price of 2014/5/6 and 2014/5/1 was excluded due to 0 trading volume on those days.

EDIT on 11/9/2014

One thing to Note: the announcement day for each stock is different, and it's not last valid date in the data, so usage of tail when calculating average was not appropriate.

Date        Price   Volume
2014/5/9    1.42    668000
2014/5/8    1.4     2972000
2014/5/7    1.5     1180000
2014/5/6    1.59    0
2014/5/5    1.59    752000
2014/5/2    1.6     138000
2014/5/1    1.6     NA
2014/4/30   1.6     656000
2014/4/29   1.61    364000
2014/4/28   1.61    1786000
2014/4/25   1.64    1734000
2014/4/24   1.68    1130000
2014/4/23   1.68    506000
2014/4/22   1.67    354000
2014/4/21   1.7     0
2014/4/18   1.7     0
2014/4/17   1.7     1954000
2014/4/16   1.65    1788000
2014/4/15   1.71    1294000
2014/4/14   1.68    1462000

Reproducible Code:

require(quantmod)
require(data.table)

tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
date_end <- as.Date("2014-09-09")


# retrive data of all stocks
prices <- getSymbols(tickers, from = date_begin, to = date_end, auto.assign = TRUE)

dataset <- merge(Cl(get(prices[1])),Vo(get(prices[1])))


for (i in 2:length(prices)){
  dataset <- merge(dataset, Cl(get(prices[i])),Vo(get(prices[i])))
}

# Write First
write.zoo(dataset, file = "prices.csv", sep = ",", qmethod = "double")

# Read zoo
test <- fread("prices.csv")

setnames(test, "Index", "Date")

Then I got a data.table. The first Column is Date, then the price and volume for each stock.

Actually, the original data contains information for about 40 stocks. Column names have the same patter: "X" + ticker.close , "X" + ticker.volumn

Last trading days for different stock were different.

The desired output :

days    0007.HK 1036.HK
5       1.1     1.1
10      1.1     1.1
30      1.1     1.1

The major issues:

  1. .SD and lapply and .SDCol can be used for looping different stocks. .N can be used when calculating last consecutive N days.

  2. Due to the different announcement day, it becomes a little complicated.

Any suggestions on single stock using quantmod or multiple stocks using data.table are extremely welcomed!

Thanks GSee and pbible for the nice solutions, it was very useful. I'll update my code later incorporating different announcement day for each stocks, and consult you later.

Indeed, it's more a xts question than a data.table one. Anything about data.table will be very helpful. Thanks a lot!

Because the different stocks have different announcement days, I tried to make a solution first following @pbible's logic, any suggestions will be extremely welcomed.

library(quantmod)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")

# Instead of making one specific date_end, different date_end is used for convenience of the following work.

date_end <- c(as.Date("2014-07-08"),as.Date("2014-05-15"))

for ( i in 1: length(date_end)) {

  stocks <- getSymbols(tickers[i], from = date_begin, to = date_end[i], auto.assign = TRUE)
  dataset <- cbind(Cl(get(stocks)),Vo(get(stocks)))
  usable <- subset(dataset,dataset[,2] > 0 & !is.na(dataset[,2]))
  sma.5 <- SMA(usable[,1],5)
  sma.10 <- SMA(usable[,1],10)
  sma.30 <- SMA(usable[,1],30)
  col <- as.matrix(rbind(tail(sma.5,1), tail(sma.10,1), tail(sma.30,1)))
  colnames(col) <- colnames(usable[,1])
  rownames(col) <- c("5","10","30")

  if (i == 1) {
    matrix <- as.matrix(col)
  }
  else  {matrix <- cbind(matrix,col)}
}

I got what I want, but the code is ugly..Any suggestions to make it elegant are extremely welcomed!

2条回答
Bombasti
2楼-- · 2019-06-02 10:22

How about something like this using the subset and moving average (SMA). Here is the solution I put together.

library(quantmod)

tickers <- c("0007.hk","1036.hk","cvx")
date_begin <- as.Date("2010-01-01")
date_end <- as.Date("2014-09-09")

stocks <- getSymbols(tickers, from = date_begin, to = date_end, auto.assign = TRUE)

stock3Summary <- function(stock){
  dataset <- cbind(Cl(get(stock)),Vo(get(stock)))
  usable <- subset(dataset,dataset[,2] > 0 & !is.na(dataset[,2]))
  sma.5 <- SMA(usable[,1],5)
  sma.10 <- SMA(usable[,1],10)
  sma.30 <- SMA(usable[,1],30)
  col <- as.matrix(rbind(tail(sma.5,1), tail(sma.10,1), tail(sma.30,1)))
  colnames(col) <- colnames(usable[,1])
  rownames(col) <- c("5","10","30")
  col
}

matrix <- as.matrix(stock3Summary(stocks[1]))

for( i in 2:length(stocks)){
  matrix <- cbind(matrix,stock3Summary(stocks[i]))
}

The output:

> matrix
   X0007.HK.Close X1036.HK.Close CVX.Close
5        1.082000       8.476000  126.6900
10       1.100000       8.412000  127.6080
30       1.094333       8.426333  127.6767

This should work with multiple stocks. It will use only the most recent valid date.

查看更多
干净又极端
3楼-- · 2019-06-02 10:33

Well, here's a way to do it. I don't know why you want to get rid of the loop, and this does not get rid of it (in fact it has a loop nested inside another). One thing that you were doing is growing objects in memory with each iteration of your loop (i.e. the matrix <- cbind(matrix,col) part is inefficient). This Answer avoids that.

library(quantmod)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")

myEnv <- new.env()
date_end <- c(as.Date("2014-07-08"),as.Date("2014-05-15"))
lookback <- c(5, 10, 30) # different number of days to look back for calculating mean.

symbols <- getSymbols(tickers, from=date_begin, 
                      to=tail(sort(date_end), 1), env=myEnv) # to=last date
end.dates <- setNames(date_end, symbols)

out <- do.call(cbind, lapply(end.dates, function(x) {
  dat <- na.omit(get(names(x), pos=myEnv))[paste0("/", x)]
  prc <- Cl(dat)[Vo(dat) > 0]
  setNames(vapply(lookback, function(n) mean(tail(prc, n)), numeric(1)), 
           lookback)
}))

colnames(out) <- names(end.dates)
out

#   0007.HK 1036.HK
#5    1.080   8.344
#10   1.125   8.459
#30   1.186   8.805

Some commentary...

  • I created a new environment, myEnv, to hold your data so that it does not clutter your workspace.
  • I used the output of getSymbols (as you did in your attempt) because the input tickers are not uppercase.
  • I named the vector of end dates so that we can loop over that vector and know both the end date and the name of the stock.
  • the bulk of the code is an lapply loop (wrapped in do.call(cbind, ...)). I'm looping over the named end.dates vector.
    1. The first line gets the data from myEnv, removes NAs, and subsets it to only include data up to the relevant end date.
    2. The next line extracts the close column and subsets it to only include rows where volume is greater than zero.
    3. The vapply loops over a vector of different lookbacks and calculates the mean. That is wrapped in setNames so that each result is named based on which lookback was used to calculate it.
  • The lapply call returns a list of named vectors. do.call(cbind, LIST) is the same as calling cbind(LIST[[1]], LIST[[2]], LIST[[3]]) except LIST can be a list of any length.
  • at this point we have a matrix with row names, but no column names. So, I named the columns based on which stock they represent.

Hope this helps.

查看更多
登录 后发表回答