Vlookup-match like function in R

2019-05-11 03:27发布

I am very new to R, and I am currently to apply however little knowledge of R I have to an analytical work I have to perform for work.

I have two dataframes - dataframe A consists of transactions details, while dataframe B consists of the monthly closing exchange rate for various currencies.

Data frame A - transaction details

    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE
1           0001              INR         305000 Mar 2014 2014-03-01
2           0002              USD          15000 Oct 2014 2014-10-31
3           0003              JPY          85000 Feb 2015 2015-02-09
4           0004              CNY        1800000 Mar 2015 2015-03-27

structure(list(TRANSACTION_ID = c("0001", "0002", "0003", "0004"), 
COLLECTION_CRNCY = c("INR", "USD", "JPY", "CNY"), COLLECTION_AMT = c(305000, 
15000, 85000, 1800000), MMYYYY = structure(c(2014.16666666667, 
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"),
LODG_DATE = structure(c(16130, 16374, 16475, 16521), class = "Date")), 
row.names = c(NA, -4L), class = "data.frame")

Data frame B - Exchange Rates

    MMYYYY       Date    CNY    INR     JPY       USD
1 Mar 2014 2014-03-31 4.9444 47.726 82.0845 0.7951654
2 Oct 2014 2014-10-31 4.7552 47.749 87.2604 0.7778469
3 Feb 2015 2015-02-27 4.5990 45.222 87.7690 0.7338372
4 Mar 2015 2015-03-31 4.5179 45.383 87.5395 0.7287036

structure(list(MMYYYY = structure(c(2014.16666666667, 
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"), 
Date = structure(c(16160, 16374, 16493, 16525), class = "Date"), CNY = 
c(4.9444, 4.7552, 4.599, 4.5179), INR = c(47.726, 47.749, 45.222, 45.383), 
JPY = c(82.0845, 87.2604, 87.769, 87.5395), USD = c(0.795165394, 0.77784692, 
0.733837235, 0.728703636)), .Names = c("MMYYYY", "Date", "CNY", "INR", "JPY", 
"USD"), class = "data.frame", row.names = c(NA, -4L))

What I would like to do is to create a new column in data frame A possibly named Exchange Rate. And I would like to get this exchange rate value by looking up to data frame B, by matching the COLLECTION_CRNCY and MMYYYY in data frame A to data frame B. I.e:

TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
1           0001              INR         305000 Mar 2014 2014-03-01    47.7260000
2           0002              USD          15000 Oct 2014 2014-10-31     0.7778469
3           0003              JPY          85000 Feb 2015 2015-02-09    87.7690000
4           0004              CNY        1800000 Mar 2015 2015-03-27     4.5179000

I can easily do this via Excel using vlookup and match, but I would like to know how I can go about achieving the same results using R as my transactions details file is quite huge.

4条回答
做个烂人
2楼-- · 2019-05-11 04:12

Transforming from wide to long format can be done with reshape(). It may be the most annoying function in R, but you can usually get where you want to go if you play around with its options for long enough. Once you have B in long format, a simple call to merge() gets the desired output.

B.id <- c('MMYYYY','Date');
B.time <- setdiff(names(B),B.id);
B.long <- reshape(B,dir='l',idvar=B.id,varying=B.time,times=B.time,timevar='COLLECTION_CRNCY',v.names='exchange.rate',new.row.names=1:(length(B.time)*nrow(B)));
B.long;
##      MMYYYY       Date COLLECTION_CRNCY exchange.rate
## 1  2014.167 2014-03-31              CNY     4.9444000
## 2  2014.750 2014-10-31              CNY     4.7552000
## 3  2015.083 2015-02-27              CNY     4.5990000
## 4  2015.167 2015-03-31              CNY     4.5179000
## 5  2014.167 2014-03-31              INR    47.7260000
## 6  2014.750 2014-10-31              INR    47.7490000
## 7  2015.083 2015-02-27              INR    45.2220000
## 8  2015.167 2015-03-31              INR    45.3830000
## 9  2014.167 2014-03-31              JPY    82.0845000
## 10 2014.750 2014-10-31              JPY    87.2604000
## 11 2015.083 2015-02-27              JPY    87.7690000
## 12 2015.167 2015-03-31              JPY    87.5395000
## 13 2014.167 2014-03-31              USD     0.7951654
## 14 2014.750 2014-10-31              USD     0.7778469
## 15 2015.083 2015-02-27              USD     0.7338372
## 16 2015.167 2015-03-31              USD     0.7287036
merge(A,B.long[c('MMYYYY','COLLECTION_CRNCY','exchange.rate')],all.x=T);
##   COLLECTION_CRNCY   MMYYYY TRANSACTION_ID COLLECTION_AMT  LODG_DATE exchange.rate
## 1              CNY 2015.167           0004        1800000 2015-03-27     4.5179000
## 2              INR 2014.167           0001         305000 2014-03-01    47.7260000
## 3              JPY 2015.083           0003          85000 2015-02-09    87.7690000
## 4              USD 2014.750           0002          15000 2014-10-31     0.7778469
查看更多
走好不送
3楼-- · 2019-05-11 04:16

One more way for reference purposes:

res <- numeric(nrow(dfA))
for(i in seq_len(nrow(dfA))) {
    res[i] <- dfB[match(dfA$MMYYYY[i], dfB$MMYYY), 
                  match(dfA$COLLECTION_CRNCY[i], names(dfB))]}
dfA$Exchange<- res
#   TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY
# 1           0001              INR         305000 2014.167
# 2           0002              USD          15000 2014.750
# 3           0003              JPY          85000 2015.083
# 4           0004              CNY        1800000 2015.167
#    LODG_DATE   Exchange
# 1 2014-03-01 47.7260000
# 2 2014-10-31  0.7778469
# 3 2015-02-09 87.7690000
# 4 2015-03-27  4.5179000
查看更多
何必那么认真
4楼-- · 2019-05-11 04:23

You could solve this problem using base R by using apply and merge.

To break down the problem,

  1. merge the two datasets together
  2. extract the relevant column

1

To merge the data simply use:

merge(dfa, dfb, by="MMYYYY")

2

To extract the relevant field, we can use the apply function, in a rowwise manner.

apply(df, 1, function(x) ...)

where df is the data.frame, 1 signals rowwise.


Putting it all together, we can extract the exchange rate in one line like this:

dfa$exchange.rate <- apply(df, 1, function(x) x[x[['COLLECTION_CRNCY']]])

What the line x[x[['COLLECTION_CRNCY']]] is doing is simply looking up the column COLLECTION_CRNCY and then using that value to query the appropriate currency column.


final code:

dfa$exchange.rate <- apply(merge(dfa, dfb, by="MMYYYY"), 1, function(x) x[x[['COLLECTION_CRNCY']]])
dfa$exchange.rate <- as.numeric(dfa$exchange.rate) # since it isn't numeric format.
#    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
#  1           0001              INR         305000 2014.167 2014-03-01    47.7260000
#  2           0002              USD          15000 2014.750 2014-10-31     0.7778469
#  3           0003              JPY          85000 2015.083 2015-02-09    87.7690000
#  4           0004              CNY        1800000 2015.167 2015-03-27     4.5179000
查看更多
走好不送
5楼-- · 2019-05-11 04:26

Here's a possible data.table approach. Basically what you need to do is to convert df2 to a long format and then just a simple (binary) left join to df1

library(data.table)
temp <- melt(setDT(df2[-2]), "MMYYYY", variable.name = "COLLECTION_CRNCY")
setkey(setDT(df1), MMYYYY, COLLECTION_CRNCY)[temp, exchange.rate := i.value]
df1
#    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
# 1:           0001              INR         305000 2014.167 2014-03-01    47.7260000
# 2:           0002              USD          15000 2014.750 2014-10-31     0.7778469
# 3:           0003              JPY          85000 2015.083 2015-02-09    87.7690000
# 4:           0004              CNY        1800000 2015.167 2015-03-27     4.5179000

Alternatively, you can do a similar thing using "Hadleyverse" but dplyr won't able to merge on zoo class columns (for now), so you'll need to unclass them first

library(dplyr)
library(tidyr)
df2[-2] %>% 
  gather(COLLECTION_CRNCY, exchange.rate, -MMYYYY) %>%
  mutate(MMYYYY = as.numeric(MMYYYY)) %>%
  left_join(df1 %>% mutate(MMYYYY = as.numeric(MMYYYY)), .,
                           by = c("MMYYYY", "COLLECTION_CRNCY"))
#   TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
# 1           0001              INR         305000 2014.167 2014-03-01    47.7260000
# 2           0002              USD          15000 2014.750 2014-10-31     0.7778469
# 3           0003              JPY          85000 2015.083 2015-02-09    87.7690000
# 4           0004              CNY        1800000 2015.167 2015-03-27     4.5179000
查看更多
登录 后发表回答