I just started using R for statistical purposes and I appreciate any kind of help.
As a first step, I ran a time series regression over my columns. Y values are dependent and the X is explanatory.
# example
Y1 <- runif(100, 5.0, 17.5)
Y2 <- runif(100, 4.0, 27.5)
Y3 <- runif(100, 3.0, 14.5)
Y4 <- runif(100, 2.0, 12.5)
Y5 <- runif(100, 5.0, 17.5)
X <- runif(100, 5.0, 7.5)
df1 <- data.frame(X, Y1, Y2, Y3, Y4, Y5)
# calculating log returns to provide data for the first regression
n <- nrow(df1)
X_logret <- log(X[2:n])-log(X[1:(n-1)])
Y1_logret <- log(Y1[2:n])-log(Y1[1:(n-1)])
Y2_logret <- log(Y2[2:n])-log(Y2[1:(n-1)])
Y3_logret <- log(Y3[2:n])-log(Y3[1:(n-1)])
Y4_logret <- log(Y4[2:n])-log(Y4[1:(n-1)])
Y5_logret <- log(Y5[2:n])-log(Y5[1:(n-1)])
# bringing the calculated log returns together in one data frame
df2 <- data.frame(X_logret, Y1_logret, Y2_logret, Y3_logret, Y4_logret, Y5_logret)
# running the time series regression
Regression <- lm(as.matrix(df2[c('Y1_logret', 'Y2_logret', 'Y3_logret', 'Y4_logret', 'Y5_logret')]) ~ df2$X)
# extracting the coefficients for further calculation
Regression$coefficients[2,(1:5)]
As a second step I want to run a regression row by row, which is day by day, since the data contains daily observed values. I also have a column "DATE" but I didn't know how to bring it in here in the example. The format of the DATE column is POSIXct, maybe someone has an idea how to refer to a certain period in it on which the regression should be done. In the row by row regression I would like to use the 5 calculated coefficients (from the first regression) as an explanatory variable. The 5 Y_logret values, I would like to use as dependent variable.
Y_logret(1 to 5) = Beta * Regression$coefficients[2,(1:5)] + error value. The intercept is not needed, so I would set it to zero by adding +0 in the lm function.
My goal is to run this regression over a period of time, for example over 20 days. Day by day, this would provide a total of 20 Beta estimates (for one regression per day), but I would also need all errors for further calculation. So I have to extract 5 errors per day, that is a total of 20*5 error values.
This is just an example, in the original dataset I have 20 of the Y values and over 4000 rows. I would like to run the regression over certain intervals with 900-1000 day. Since I am completely new to R, I have no idea how to proceed. Especially how to code this in a few lines.
I really appreciate any kind of help.