How to take minimum value out of several ask quote

2019-09-12 09:09发布

问题:

I have a data set containing bid and ask quotes for 3 days and a stock. Following is the portion of the dataset. I have also given a link to the sample data set to illustrate the pecularity of the matter.

> dput(head(q,30))

structure(list(Date = structure(c(1471424400, 1471424400, 1471424400, 
1471424401, 1471424401, 1471424406, 1471424407, 1471424415, 1471424417, 
1471424514, 1471424527, 1471424567, 1471424576, 1471424606, 1471424607, 
1471424621, 1471424621, 1471424621, 1471424641, 1471424642, 1471424715, 
1471424715, 1471424717, 1471424717, 1471424741, 1471424741, 1471424742, 
1471424793, 1471424800, 1471424807), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Type = c("ASK", "BID", "ASK", "BID", "ASK", 
"ASK", "BID", "BID", "BID", "ASK", "ASK", "BID", "BID", "BID", 
"BID", "BID", "ASK", "BID", "BID", "BID", "BID", "BID", "BID", 
"BID", "BID", "ASK", "BID", "BID", "BID", "BID"), Price = c(1749.95, 
1611, 1683.9, 1653, 1672, 1683.9, 1653, 1654.2, 1663, 1682, 1663, 
1664.75, 1663, 1664.75, 1663, 1664.75, 1662.9, 1663, 1664.75, 
1663, 1664.75, 1663, 1664.75, 1663, 1664.75, 1662.9, 1663, 1664.75, 
1663, 1664.75)), .Names = c("Date", "Type", "Price"), class = c("tbl_df", 
"data.frame"), row.names = c(NA, -30L))


                  Date  Type   Price
                (time) (chr)   (dbl)
1  2016-08-17 09:00:00   ASK 1749.95
2  2016-08-17 09:00:00   BID 1611.00
3  2016-08-17 09:00:00   ASK 1683.90
4  2016-08-17 09:00:01   BID 1653.00
5  2016-08-17 09:00:01   ASK 1672.00
6  2016-08-17 09:00:06   ASK 1683.90
7  2016-08-17 09:00:07   BID 1653.00
8  2016-08-17 09:00:15   BID 1654.20
9  2016-08-17 09:00:17   BID 1663.00
10 2016-08-17 09:01:54   ASK 1682.00
..                 ...   ...     ...

It has multiple quotes for some of the time stamps. I want to estimate best bid and best ask out of these quotes using following steps:

Best bid process

  • Estimate the highest bid for a time stamp.
  • Compare it with the prevailing best bid quote.
  • If it is greater than the prevailing best bid then it is the best bid, otherwise the prevailing best bid is the best bid.

Best ask process

  • Estimate the lowest ask for a time stamp.

  • Compare it with the prevailing best ask quote.

  • If it is lower than the prevailing best ask then it is the best ask, otherwise the prevailing best ask is the best bid. The appendices of this document contains SAS code for this purpose.

The Required Output structure:

The dataframe should have 3 columns (date, Best Ask, Best Bid). The rows should not have duplicate time stamps and So, both best bid and best ask should be in the same row for a particular time stamp rather than in two adjacent rows.

What I have tried

I don't know how to compare the minimum ask for the current time stamp with the minimum ask of previous time stamp. So, following code only gives minimum ask for a time stamp.

library(dplyr)
library(tidyr)
library(reshape2)

m.q<- head(q,30) %>% 
  gather(variable, value, -(Date:Type)) %>%
  unite(temp,Type, variable) %>% 
  group_by(Date) %>% dcast(Date ~ temp, fun=min)

I get the following:

> dput(m.q)
structure(list(Date = structure(c(1471424400, 1471424401, 1471424406, 
1471424407, 1471424415, 1471424417, 1471424514, 1471424527, 1471424567, 
1471424576, 1471424606, 1471424607, 1471424621, 1471424641, 1471424642, 
1471424715, 1471424717, 1471424741, 1471424742, 1471424793, 1471424800, 
1471424807), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    ASK_Price = c(1683.9, 1672, 1683.9, Inf, Inf, Inf, 1682, 
    1663, Inf, Inf, Inf, Inf, 1662.9, Inf, Inf, Inf, Inf, 1662.9, 
    Inf, Inf, Inf, Inf), BID_Price = c(1611, 1653, Inf, 1653, 
    1654.2, 1663, Inf, Inf, 1664.75, 1663, 1664.75, 1663, 1663, 
    1664.75, 1663, 1663, 1663, 1664.75, 1663, 1664.75, 1663, 
    1664.75)), .Names = c("Date", "ASK_Price", "BID_Price"), row.names = c(NA, 
-22L), class = "data.frame")

                 Date ASK_Price BID_Price
1 2016-08-17 09:00:00    1683.9    1611.0
2 2016-08-17 09:00:01    1672.0    1653.0
3 2016-08-17 09:00:06    1683.9       Inf
4 2016-08-17 09:00:07       Inf    1653.0
5 2016-08-17 09:00:15       Inf    1654.2
6 2016-08-17 09:00:17       Inf    1663.0
[...]

Please note the following issues:

  1. Both bid and ask have values for some of the timestamps but they are shown in adjacent rows.
  2. The function min() is getting applied to bid becuase I do not how perform min only for ask and max only for bid simultaneously in the dcast function.
  3. Instead of "Inf" in the cells it should be NA in case there is no bid or ask value for the time stamp.

This is my data set.

This is question involves R coding. I do not know SAS and a beginner in R. (Thanks in advance for constructive comments and answers, feel free to improve my post.)

回答1:

This can be accomplished with the dplyr and zoo packages.

Use dplyr to find either a minimum or maximum price per timestamp (depending on whether we are looking at BID or ASK values), then group by "Type" and arrange the summarized values by Date. From here, we use zoo::rollapply to compute either a rolling minimum or rolling maximum, depending on whether the Type is ASK or BID.

library(dplyr)
library(zoo)

data.new <- group_by(data.stock, Date, Type) %>% 
    summarize(price = ifelse(Type[1] == 'ASK', min(Price), max(Price))) %>% 
    ungroup() %>% 
    group_by(Type) %>% 
    arrange(Type, Date) %>% 
    mutate(change = ifelse(Type == 'ASK',
                           rollapply(price, 1:length(price), min, align = 'right'),
                           rollapply(price, 1:length(price), max, align = 'right')
                           ))

Output:

                  Date  Type  price change
                <time> <chr>  <dbl>  <dbl>
1  2016-08-17 09:00:00   ASK 1683.9 1683.9
2  2016-08-17 09:00:01   ASK 1672.0 1672.0
3  2016-08-17 09:00:06   ASK 1683.9 1672.0
4  2016-08-17 09:01:54   ASK 1682.0 1672.0
5  2016-08-17 09:02:07   ASK 1663.0 1663.0
6  2016-08-17 09:03:41   ASK 1662.9 1662.9
7  2016-08-17 09:05:41   ASK 1662.9 1662.9
8  2016-08-17 09:00:00   BID 1611.0 1611.0
9  2016-08-17 09:00:01   BID 1653.0 1653.0
10 2016-08-17 09:00:07   BID 1653.0 1653.0
# ... with 16 more rows

The rolling minimum (or maximum) is thus preserved as we move down the timestamps. As an added bonus, this will never produce Inf or NA values.

Here is a line-by-line breakdown of what dplyr is doing:

group_by(data.stock, Date, Type) %>% 

This tells dplyr to take the stock price data and cut it into subgroups according to combinations of Date and Type. We will then apply subsequent operations to these subgroups (see the "Split-Apply-Combine" philosophy). All subsequent operations are performed on the subgroups as if they are separate. The %>% symbol "pipes" the output of the group_by command. to the next command (see the magrittr package).

summarize(price = ifelse(Type[1] == 'ASK', min(Price), max(Price))) %>% 

This will compute a summary for each unique combination of Date and Type (established in the group_by line above). If Type is "ASK", we should calculate the minimum of the prices. Otherwise we calculate the maximum.

ungroup() %>% 
group_by(Type) %>% 

These lines tell dplyr to reset the grouping variables, since we no longer want to group by individual Dates.

arrange(Type, Date) %>% 

To be on the safe side, I arrange the data frame by Type and Date, to ensure that all values are in chronological order.

mutate(change = ifelse(Type == 'ASK',
                       rollapply(price, 1:length(price), min, align = 'right'),
                       rollapply(price, 1:length(price), max, align = 'right')
                       ))

Finally, the most complicated line. Here we are using dplyr::mutate to create a column that stores the result of a rolling operation. If the Type is "ASK", we apply a rolling minimum to the price column. If it is not, we apply a rolling maximum. The result of the rolling min/max is stored in a new column called "change". The zoo package makes these types of "rolling" operations much easier to perform.