Split data frame into multiple data frames based o

2019-07-30 13:58发布

问题:

I really need your help on the following issue:

I have two data frames - one containing a portfolio of securities with ISIN and Cluster information.

 > dfInput

          TICKER                 CLUSTER                  SECURITY.NAME
1       LU0937588209              High Yield      Prime Capital Access SA SICAV-
2       LU0694362343              High Yield      ECM CREDIT FUND SICAV - ECM Hi
3       IE0030390896              High Yield      Putnam World Trust - Global Hi
4       LU0575374342                 EM Debt      Ashmore SICAV - Emerging Marke
5       LU0493865678                 EM Debt      Ashmore SICAV - Emerging Marke
6       LU0972237696                 EM Debt      Galloway Global Fixed Income F
7       IE00B6TLWG59                 ILS/CAT      GAM Star Fund PLC - Cat Bond F
8       LU0816333396                 ILS/CAT      LGT Lux I - Cat Bond Fund
9       LU0879473352              L/S Credit      Merrill Lynch Investment Solut
10 HINCFEF ID Equity              L/S Credit      Hedge Invest International Fun
11      FR0011034800              L/S Credit      Schelcher Prince Opportunite E
12 PIMCSEI ID Equity              L/S Credit      PIMCO Funds Global Investors S
13     VTR US Equity                   REITs      Ventas Inc
14     HCP US Equity                   REITs      HCP Inc
15   VGSIX US Equity                   REITs      Vanguard REIT Index Fund
16     NLY US Equity                 M REITs      Annaly Capital Management Inc
17    CLNY US Equity                 M REITs      Colony Financial Inc
18    AGNC US Equity                 M REITs      American Capital Agency Corp
19     REM US Equity                 M REITs      iShares Mortgage Real Estate C
20      ES0130960018 Infrastructure Equities      Enagas SA
21    SDRL US Equity Infrastructure Equities      Seadrill Ltd
22     IGF US Equity Infrastructure Equities      iShares Global Infrastructure
23     KMP US Equity                     MLP      Kinder Morgan Energy Partners
24     EPD US Equity                     MLP      Enterprise Products Partners L
25    MLPI US Equity                     MLP      ETRACS Alerian MLP Infrastruct
26    HTGC US Equity                     BDC      Hercules Technology Growth Cap
27    TCPC US Equity                     BDC      TCP Capital Corp
28    MAIN US Equity                     BDC      Main Street Capital Corp
29    BDCS US Equity                     BDC      ETRACS Linked to the Wells Far

The other contains multiple time series of returns of these securities with the security name as column name (the data is coming from an excel file)

> PortfolioR.xts

              Ventas.Inc       HCP.Inc    ....
2011-01-03  0.0000000000  0.0000000000
2011-01-04 -0.0117725362 -0.0056323067
2011-01-05 -0.0081155489  0.0018809625
2011-01-06 -0.0009479572 -0.0154202974
2011-01-07 -0.0058974774 -0.0054674822
2011-01-10 -0.0074691528 -0.0077050464
2011-01-11 -0.0036591278  0.0052348928
2011-01-12  0.0132249172 -0.0091097938
2011-01-13  0.0015220703  0.0085600412
2011-01-14  0.0058762372 -0.0038567541
2011-01-17  0.0000000000  0.0000000000
2011-01-18  0.0157513101 -0.0002760525
2011-01-19 -0.0059712810 -0.0074823683
2011-01-20  0.0013092679  0.0049944610
2011-01-21  0.0013075560 -0.0055509440
...

How can I split now the xts object based on the cluster information of the portfolio?

The result should be to have for each CLUSTER a separate data.frame or xts object containing the return history of the securities belonging to this cluster.

Is this possible?

Thank you in advance...

回答1:

Here's one way to do it:

setNames(lapply(unique(dfInput$CLUSTER), function(x) {
  PortfolioR[, which(dfInput$CLUSTER[match(colnames(PortfolioR), 
                                           dfInput$SECURITY.NAME)] == x)]
}), unique(dfInput$CLUSTER))

For example:

# Set up some fake data
d1 <- data.frame(grp=sample(LETTERS[1:4], 10, replace=TRUE),
                 name=letters[1:10])

d1

#    grp name
# 1    A    a
# 2    B    b
# 3    B    c
# 4    D    d
# 5    C    e
# 6    B    f
# 7    B    g
# 8    A    h
# 9    D    i
# 10   A    j

d2 <- matrix(round(runif(50), 2), ncol=10)
colnames(d2) <- letters[1:10]
library(xts)
d2 <- xts(d2, seq.Date(as.Date('01-01-2011', '%d-%m-%Y'), 
                       as.Date('5-01-2011', '%d-%m-%Y'), 1))

d2

#               a    b    c    d    e    f    g    h    i    j
# 2011-01-01 0.51 0.41 0.69 0.87 0.37 0.86 0.47 0.68 0.64 0.73
# 2011-01-02 0.72 0.92 0.53 0.55 0.62 0.54 0.75 0.64 0.04 0.72
# 2011-01-03 0.34 0.50 0.92 0.23 0.59 0.09 0.78 0.53 0.26 0.27
# 2011-01-04 0.52 0.47 0.49 0.25 0.18 0.07 0.65 0.13 0.46 0.74
# 2011-01-05 0.10 0.87 0.10 0.48 0.58 0.72 0.96 0.71 0.78 0.80

out <- setNames(sapply(unique(d1$grp), function(x) {
  d2[, which(d1$grp[match(colnames(d2), d1$name)] == x)]
}), unique(d1$grp))

out 

# $A
#               a    h    j
# 2011-01-01 0.51 0.68 0.73
# 2011-01-02 0.72 0.64 0.72
# 2011-01-03 0.34 0.53 0.27
# 2011-01-04 0.52 0.13 0.74
# 2011-01-05 0.10 0.71 0.80
# 
# $B
#               b    c    f    g
# 2011-01-01 0.41 0.69 0.86 0.47
# 2011-01-02 0.92 0.53 0.54 0.75
# 2011-01-03 0.50 0.92 0.09 0.78
# 2011-01-04 0.47 0.49 0.07 0.65
# 2011-01-05 0.87 0.10 0.72 0.96
# 
# $C
#               d    i
# 2011-01-01 0.87 0.64
# 2011-01-02 0.55 0.04
# 2011-01-03 0.23 0.26
# 2011-01-04 0.25 0.46
# 2011-01-05 0.48 0.78
# 
# $D
#               e
# 2011-01-01 0.37
# 2011-01-02 0.62
# 2011-01-03 0.59
# 2011-01-04 0.18
# 2011-01-05 0.58

If you want the list elements (which are xts objects) to be standalone xts objects in the global environment, you can use list2env:

list2env(out, globalenv())

This will overwrite any objects in the global environment that have the same names as the list elements (i.e. A, B, C and D for the example above).