Combining time-series objects and lists: Package “

2020-02-29 05:39发布

问题:

The R package "termstrc", designed for term-structure estimation, is an incredibly useful tool, but it requires data to be set in a particularly awkward format: lists within lists.

Question: What is the best way to prepare and shape data, either outside R or inside R, in order to create the repeated sublist format required to run the function "dyncouponbonds"?

The "dyncouponbonds" command requires data to be set in a repeated sublist, whereby a list of bonds and time-invariant features of those bonds (let's call this "bondlist"), is appended with some time t features of those bonds (price and accrued interest), and replicated for time t+1 to T.

Below is an example of the list format for one period. The "dyncouponbonds" command requires this format to be replicated, within an umbrella list, for all T periods. ISIN, MATURITYDATE, ISSUEDATE, COUPONRATE will be identical for each period. PRICE, ACCRUED, CASHFLOWS and TODAY will be different for each period.

R> str(govbonds$GERMANY)

List of 8
$ ISIN : chr [1:52] "DE0001141414" "DE0001137131" "DE0001141422" ...
$ MATURITYDATE:Class 'Date' num [1:52] 13924 13952 13980 14043 ...
$ ISSUEDATE :Class 'Date' num [1:52] 11913 13215 12153 13298 ...
$ COUPONRATE : num [1:52] 0.0425 0.03 0.03 0.0325 ...
$ PRICE : num [1:52] 100 99.9 99.8 99.8 ...
$ ACCRUED : num [1:52] 4.09 2.66 2.43 2.07 ...
$ CASHFLOWS :List of 3
..$ ISIN: chr [1:384] "DE0001141414" "DE0001137131" "DE0001141422" ...
..$ CF : num [1:384] 104 103 103 103 ...
..$ DATE:Class 'Date' num [1:384] 13924 13952 13980 14043 ...
$ TODAY :Class 'Date' num 13908

回答1:

This a fairly advanced data manipulation question. R has many powerful data manipulation tools and you're not going to need to move away from R to prepare the (admittedly fairly obtuse) dyncouponbonds object. Indeed you actually shouldn't, because taking a structure from another language and then turning into dyncouponbonds will simply be more work.

The first thing I would make sure is that you are very familiar with the lapply function. You're going to be making plenty of use of it. You're going to be using it to create a list of couponbonds objects, which is what dyncouponbonds actually is. Creating couponbonds objects however is a little tougher, mainly because of the CASHFLOWS sublist which wants each cashflow associated with the bond's ISIN and with the date of the cashflow. For this you'll use lapply and some fairly advanced subscripting. The subset function will also come in handy.

This question also very much depends on where you will be getting the data from, and getting it out of Bloomberg is non-trivial, mainly because you will need to go back in history using the BDS function and "DES_CASH_FLOW" field for each bond to get its cashflows. I say history, because if you're using dyncouponbonds I'm assuming you will want to do historic yield curve analysis. You'll need to override the BDS function's "SETTLE_DT" field, to the value that you will have received for the bond using the BDP function and field "FIRST_SETTLE_DT", so that you get all the cashflows from the beginning of the bond's life (otherwise it'll only return from today, and that's no good for historic analysis). But I digress. If you're not using bloomberg I don't know where you'll get this data from.

You'll then need to get the static data for each bond, namely the maturity, the ISIN, and the coupon rate and the issue date. And you'll need historic price and accrued interest data. Again if using bloomberg, you'll use the BDP function for this with fields you'll see in the code, below, and the historic data function BDH which I have wrapped as bbdh. Assuming again that you're a bloomberg user, here is the code:

bbGetCountry <- function(cCode, up = FALSE) {
# this function is going to get all the data out of bloomberg that we need for a
# country, and update it if ncessary
    if (up == TRUE) startDate <- as.Date("2012-01-01") else startDate <- histStartDate 
    # first get all the curve members for history
    wdays <- wdaylist(startDate, Sys.Date()) # create the list of working days from startdate
    actives <- lapply(wdays, function(x) { 
        bds(conn, BBcurveIDs[cCode], "CURVE_MEMBERS", override_fields = "CURVE_DATE",
        override_values = format(x, "%Y%m%d"))
    })
    names(actives) <- wdays
    uniqueActives <- unique(unlist(actives)) # there will be puhlenty duplicates. Get rid of them
    # now get the unchanging bond data
    staticData <- bdp(conn, uniqueActives, bbStaticDataFields)
    # now get the cash flowdata
    cfData <- lapply(uniqueActives, function(x) {
        bds(conn, x, "DES_CASH_FLOW_ADJ", override_fields = "SETTLE_DT", 
            override_values = format(as.Date(staticData[x, "FIRST_SETTLE_DT"]), "%Y%m%d"))
    })
    names(cfData) <- uniqueActives
    # now for historic data
    historicData <- lapply(bbHistoricDataFields, function(x) bbdh(uniqueActives, flds = x, startDate = startDate))
    names(historicData) <- bbHistoricDataFields   # put the names in otherwise we get a numbered list
    allDates <- as.Date(index(historicData$LAST_PRICE)) # all the dates we will find settlement dates for for all bonds. No posix
    save(actives, file = paste("data/", cCode, "actives.dat", sep = ""))      #save all the files now
    save(staticData, file = paste("data/", cCode, "staticData.dat", sep = ""))
    save(cfData, file = paste("data/", cCode, "cfData.dat", sep = ""))
    save(historicData, file = paste("data/", cCode, "historicData.dat", sep = ""))
    #save(settleDates, file = paste("data/", cCode, "settleDates.dat", sep = ""))
    assign(paste(cCode, "data", sep = ""), list(actives = actives, staticData = staticData, cfData = cfData,    #
        historicData = historicData), pos = 1)

}

the bbdh function I use above is wrapper around the Rbbg library's bdh function and looks like this:

bbdh <- function(secs, years = 1, flds = "last_price", startDate = NULL) {
        #this function gets secs over years from bloomberg daily data
            if(is.null(startDate)) startDate <- Sys.Date() - years * 365.25
            if(class(startDate) == "Date") stardDate <- format(startDate, "%Y%m%d") #convert date classes to bb string
            if(nchar(startDate) > 8) startDate <- format(as.Date(startDate), "%Y%m%d") # if we've been passed wrong format character string 
            rawd <- bdh(conn, secs, flds, startDate, always.display.tickers = TRUE, include.non.trading.days = TRUE,
                option_names = c("nonTradingDayFillOption", "nonTradingDayFillMethod"),
                option_values = c("NON_TRADING_WEEKDAYS", "PREVIOUS_VALUE"))
            rawd <- dcast(rawd, date ~ ticker) #put into columns
            colnames(rawd) <- sub(" .*", "", colnames(rawd)) #remove the govt, currncy bits from bb tickers
            return(xts(rawd[, -1], order.by = as.POSIXct(rawd[, 1])))
        }

The country code comes from a structure which associates two letter names with bloomberg yield curve descriptions:

BBcurveIDs  <- list(PO = "YCGT0084 Index", #Portugal
                    DE = "YCGT0016 Index", 
                    FR = "YCGT0014 Index", 
                    SP = "YCGT0061 Index",
                    IT = "YCGT0040 Index",
                    AU = "YCGT0001 Index", #Australia
                    AS = "YCGT0063 Index", #Austria
                    JP = "YCGT0018 Index",
                    GB = "YCGT0022 Index",
                    HK = "YCGT0095 Index",
                    CA = "YCGT0007 Index",
                    CH = "YCGT0082 Index",
                    NO = "YCGT0078 Index",
                    SE = "YCGT0021 Index",
                    IR = "YCGT0062 Index",
                    BE = "YCGT0006 Index",
                    NE = "YCGT0020 index", 
                    ZA = "YCGT0090 Index",
                    PL = "YCGT0177 Index", #Poland
                    MX = "YCGT0251 Index")

So bbGetCountry will create 4 different data structures, called actives, staticData, dynamicData, and historicData, all from the following bloomberg fields:

bbStaticDataFields <- c("ID_ISIN",
                      "ISSUER", 
                      "COUPON",
                      "CPN_FREQ",
                      "MATURITY",
                      "CALC_TYP_DES",                    # pricing calculation type 
                      "INFLATION_LINKED_INDICATOR",     # N or Y, in R returned as TRUE or FALSE
                      "ISSUE_DT",
                      "FIRST_SETTLE_DT",
                      "PX_METHOD",                      # PRC or YLD 
                      "PX_DIRTY_CLEAN",                 # market convention dirty or clean
                      "DAYS_TO_SETTLE",
                      "CALLABLE",
                      "MARKET_SECTOR_DES",
                      "INDUSTRY_SECTOR",
                      "INDUSTRY_GROUP",
                      "INDUSTRY_SUBGROUP")

bbDynamicDataFields <- c("IS_STILL_CALLABLE",
                        "RTG_MOODY",
                        "RTG_MOODY_WATCH",
                        "RTG_SP",
                        "RTG_SP_WATCH",
                        "RTG_FITCH",
                        "RTG_FITCH_WATCH")

bbHistoricDataFields <- c("PX_BID",
                          "PX_ASK",
                          #"PX_CLEAN_BID",
                          #"PX_CLEAN_ASK",
                          "PX_DIRTY_BID",
                          "PX_DIRTY_ASK",
                          #"ASSET_SWAP_SPD_BID",
                          #"ASSET_SWAP_SPD_ASK",
                          "LAST_PRICE",
                          #"SETTLE_DT",
                          "YLD_YTM_MID")

Now you're ready to create couponbond objects, using all these data structures:

createCouponBonds <- function(cCode, dateString) {
    cdata <- get(paste(cCode, "data", sep = "")) # get the data set
    today <- as.Date(dateString)
    settleDate <- today
    daycount <- 0
    while(daycount < 3) {
        settleDate <- settleDate + 1
        if (!(weekdays(settleDate) %in% c("Saturday", "Sunday"))) daycount <- daycount + 1
    }
    goodbonds <- subset(cdata$staticData, COUPON != 0 & INFLATION_LINKED_INDICATOR == FALSE) # clean out zeros and tbills
    goodbonds <- goodbonds[rownames(goodbonds) %in% cdata$actives[[dateString]][, 1], ]
    stripnames <- sapply(strsplit(rownames(goodbonds), " "), function(x) x[1])
    pxbid <- cdata$historicData$PX_BID[today, stripnames]
    pxask <- cdata$historicData$PX_ASK[today, stripnames]
    pxdbid <- cdata$historicData$PX_DIRTY_BID[today, stripnames]
    pxdask <- cdata$historicData$PX_DIRTY_ASK[today, stripnames]
    price <- as.numeric((pxbid + pxask) / 2)
    accrued <- as.numeric(pxdbid - pxbid)
    cashflows <- lapply(rownames(goodbonds), function(x) {
        goodflows <- cdata$cfData[[x]][as.Date(cdata$cfData[[x]][, "Date"]) >= today, ]
        #gfstipnames <- sapply(strsplit(rownames(goodflows), " "), function(x) x[1]) dunno if I need this
        isin <- rep(cdata$staticData[x, "ID_ISIN"], nrow(goodflows))
        cf <- apply(goodflows[, 2:3], 1, sum) / 10000
        dt <- as.Date(goodflows[, 1])
        return(list(isin = isin, cf = cf, dt = dt))
    })
    isinvec <- unlist(lapply(cashflows, function(x) x$isin))
    cfvec <- as.numeric(unlist(lapply(cashflows, function(x) x$cf)))
    datevec <- unlist(lapply(cashflows, function(x) x$dt))
    govbonds <- list(ISIN = goodbonds$ID_ISIN, 
                     MATURITYDATE = as.Date(goodbonds$MATURITY),
                     ISSUEDATE = as.Date(goodbonds$FIRST_SETTLE_DT),
                     COUPONRATE = as.numeric(goodbonds$COUPON) / 100,
                     PRICE = price,
                     ACCRUED = accrued,
                     CASHFLOWS = list(ISIN = isinvec, CF = cfvec, DATE = as.Date(datevec)),
                     TODAY = settleDate)
    govbonds <- list(govbonds)
    names(govbonds) <- cCode
    class(govbonds) <- "couponbonds"
    return(govbonds)
}

Take a close look at the cashflows <- lapply... function because this is where you'll create the sublist and is the core of the answer to your question, although of course, how this is done depends very much on how you have decided to build the intermediate data structures, and I have given you just one possibility. I realise that my answer is complex, but the problem is very complex. All the code you need is not in this answer either, a few helper functions are missing, but I am happy to provide them if you contact me. Certainly the skeleton of the core functions is all here, and actually, much of the problem is getting the data in the first place, and structuring it appropriately. You correctly surmise that some of the data is static for each bond, some of it is dynamic, and some of it is historical. So the dimensions of the intermediate datas structures are different for different pieces of the couponbonds objects. How you represent that is up to you, though I have used separate lists / data frames for each, linked via the bond IDs where necessary.

The function above will take a date string so you can do it for each of your historic data points, using the above-mentioned lapply, and hey "presto", dyncouponds:

spl <<- lapply(dodates, function(x) createCouponBonds("SP", x))
    names(spl) <<- lapply(spl, function(x) x$SP$TODAY)
    class(spl) <- "dyncouponbonds"

There you go. You asked for it....

If you're not using bloomberg, your input data structures will be very different but, as I said starting out, get super familiar with lapply and sapply. OBviously there are many other ways this problem could be solved, but the above works for Bloomberg. If you understand this code, you'll surely know what you're doing for other data sources.

Finally please note that the Rbbg package from findata.org is used to interface to bloomberg.



回答2:

My 2 cents, I have been trying to get this work with new Rblpapi. I still have some problems with createCouponBonds part but I think other functions returns correctly. Won't solve whole problem but at least partial fix. BBcurveIDs, bbStaticDataFields, bbDynamicDataFields, bbHistoricDataFields are the same as above.

bbGetCountry <- function(cCode, up = FALSE) {
  if (up == TRUE) startDate <- as.Date("2016-01-01") else startDate <- histStartDate 
  cal <- Calendar(weekdays=c("saturday", "sunday"))
  wdays <- as.list(bizseq(startDate, Sys.Date(), cal))
  actives <- lapply(wdays, function(x) { 
    bds(BBcurveIDs[cCode][[1]], "CURVE_MEMBERS", override = c(CURVE_DATE=format(x, "%Y%m%d")))
  })
  names(actives) <- wdays
  uniqueActives <- unique(unlist(actives))
  staticData <- bdp(uniqueActives, bbStaticDataFields)
  cfData <- lapply(uniqueActives, function(x) {
    bds(x, "DES_CASH_FLOW_ADJ", override = c(SETTLE_DT = format(as.Date(staticData[x, "FIRST_SETTLE_DT"]), "%Y%m%d")))
  })
  names(cfData) <- uniqueActives

  historicData <- lapply(bbHistoricDataFields, function(x) bbdh(uniqueActives, flds = x, startDate = startDate))
  names(historicData) <- bbHistoricDataFields
  allDates <- as.Date(index(historicData$LAST_PRICE))

  save(actives, file = paste("data_", cCode, "actives.dat", sep = ""))
  save(staticData, file = paste("data_", cCode, "staticData.dat", sep = ""))
  save(cfData, file = paste("data_", cCode, "cfData.dat", sep = ""))
  save(historicData, file = paste("data_", cCode, "historicData.dat", sep = ""))
  #save(settleDates, file = paste("data_", cCode, "settleDates.dat", sep = ""))
  assign(paste(cCode, "data", sep = ""), list(actives = actives, staticData = staticData, cfData = cfData,    #
                                              historicData = historicData), pos = 1)

}

And bbdh function:

bbdh <- function(secs, years = 1, flds = "last_price", startDate = NULL) {
  if(is.null(startDate)) startDate <- Sys.Date() - years * 365.25
  if(class(startDate) == "Date") stardDate <- format(startDate, "%Y%m%d")
  if(nchar(startDate) > 8) startDate <- format(as.Date(startDate), "%Y%m%d")
  rawd <- bdh(secs, flds, 
              startDate, 
              include.non.trading.days = FALSE,
              options = structure(c("PREVIOUS_VALUE", "NON_TRADING_WEEKDAYS"),
                                  names = c("nonTradingDayFillMethod","nonTradingDayFillOption")))
  rawd <- ldply(rawd, data.frame)
  colnames(rawd) <- c("sec", "date", "fld")
  rawd <- dcast(rawd, date ~ sec, value.var="fld")
  colnames(rawd) <- gsub(" Corp", "", colnames(rawd))
  return(xts(rawd[,-1], order.by=rawd[,1]))
}