I have a data set containing bid and ask quotes for 3 days and a stock. Following is the portion of the dataset. I have also given a link to the sample data set to illustrate the pecularity of the matter.
> dput(head(q,30))
structure(list(Date = structure(c(1471424400, 1471424400, 1471424400,
1471424401, 1471424401, 1471424406, 1471424407, 1471424415, 1471424417,
1471424514, 1471424527, 1471424567, 1471424576, 1471424606, 1471424607,
1471424621, 1471424621, 1471424621, 1471424641, 1471424642, 1471424715,
1471424715, 1471424717, 1471424717, 1471424741, 1471424741, 1471424742,
1471424793, 1471424800, 1471424807), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Type = c("ASK", "BID", "ASK", "BID", "ASK",
"ASK", "BID", "BID", "BID", "ASK", "ASK", "BID", "BID", "BID",
"BID", "BID", "ASK", "BID", "BID", "BID", "BID", "BID", "BID",
"BID", "BID", "ASK", "BID", "BID", "BID", "BID"), Price = c(1749.95,
1611, 1683.9, 1653, 1672, 1683.9, 1653, 1654.2, 1663, 1682, 1663,
1664.75, 1663, 1664.75, 1663, 1664.75, 1662.9, 1663, 1664.75,
1663, 1664.75, 1663, 1664.75, 1663, 1664.75, 1662.9, 1663, 1664.75,
1663, 1664.75)), .Names = c("Date", "Type", "Price"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -30L))
Date Type Price
(time) (chr) (dbl)
1 2016-08-17 09:00:00 ASK 1749.95
2 2016-08-17 09:00:00 BID 1611.00
3 2016-08-17 09:00:00 ASK 1683.90
4 2016-08-17 09:00:01 BID 1653.00
5 2016-08-17 09:00:01 ASK 1672.00
6 2016-08-17 09:00:06 ASK 1683.90
7 2016-08-17 09:00:07 BID 1653.00
8 2016-08-17 09:00:15 BID 1654.20
9 2016-08-17 09:00:17 BID 1663.00
10 2016-08-17 09:01:54 ASK 1682.00
.. ... ... ...
It has multiple quotes for some of the time stamps. I want to estimate best bid and best ask out of these quotes using following steps:
Best bid process
- Estimate the highest bid for a time stamp.
- Compare it with the prevailing best bid quote.
- If it is greater than the prevailing best bid then it is the best bid, otherwise the prevailing best bid is the best bid.
Best ask process
Estimate the lowest ask for a time stamp.
Compare it with the prevailing best ask quote.
If it is lower than the prevailing best ask then it is the best ask, otherwise the prevailing best ask is the best bid. The appendices of this document contains SAS code for this purpose.
The Required Output structure:
The dataframe should have 3 columns (date, Best Ask, Best Bid). The rows should not have duplicate time stamps and So, both best bid and best ask should be in the same row for a particular time stamp rather than in two adjacent rows.
What I have tried
I don't know how to compare the minimum ask for the current time stamp with the minimum ask of previous time stamp. So, following code only gives minimum ask for a time stamp.
library(dplyr)
library(tidyr)
library(reshape2)
m.q<- head(q,30) %>%
gather(variable, value, -(Date:Type)) %>%
unite(temp,Type, variable) %>%
group_by(Date) %>% dcast(Date ~ temp, fun=min)
I get the following:
> dput(m.q)
structure(list(Date = structure(c(1471424400, 1471424401, 1471424406,
1471424407, 1471424415, 1471424417, 1471424514, 1471424527, 1471424567,
1471424576, 1471424606, 1471424607, 1471424621, 1471424641, 1471424642,
1471424715, 1471424717, 1471424741, 1471424742, 1471424793, 1471424800,
1471424807), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
ASK_Price = c(1683.9, 1672, 1683.9, Inf, Inf, Inf, 1682,
1663, Inf, Inf, Inf, Inf, 1662.9, Inf, Inf, Inf, Inf, 1662.9,
Inf, Inf, Inf, Inf), BID_Price = c(1611, 1653, Inf, 1653,
1654.2, 1663, Inf, Inf, 1664.75, 1663, 1664.75, 1663, 1663,
1664.75, 1663, 1663, 1663, 1664.75, 1663, 1664.75, 1663,
1664.75)), .Names = c("Date", "ASK_Price", "BID_Price"), row.names = c(NA,
-22L), class = "data.frame")
Date ASK_Price BID_Price
1 2016-08-17 09:00:00 1683.9 1611.0
2 2016-08-17 09:00:01 1672.0 1653.0
3 2016-08-17 09:00:06 1683.9 Inf
4 2016-08-17 09:00:07 Inf 1653.0
5 2016-08-17 09:00:15 Inf 1654.2
6 2016-08-17 09:00:17 Inf 1663.0
[...]
Please note the following issues:
- Both bid and ask have values for some of the timestamps but they are shown in adjacent rows.
- The function min() is getting applied to bid becuase I do not how perform min only for ask and max only for bid simultaneously in the dcast function.
- Instead of "Inf" in the cells it should be NA in case there is no bid or ask value for the time stamp.
This is my data set.
This is question involves R coding. I do not know SAS and a beginner in R. (Thanks in advance for constructive comments and answers, feel free to improve my post.)