I am very new to R, and I am currently to apply however little knowledge of R I have to an analytical work I have to perform for work.
I have two dataframes - dataframe A consists of transactions details, while dataframe B consists of the monthly closing exchange rate for various currencies.
Data frame A - transaction details
TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE
1 0001 INR 305000 Mar 2014 2014-03-01
2 0002 USD 15000 Oct 2014 2014-10-31
3 0003 JPY 85000 Feb 2015 2015-02-09
4 0004 CNY 1800000 Mar 2015 2015-03-27
structure(list(TRANSACTION_ID = c("0001", "0002", "0003", "0004"),
COLLECTION_CRNCY = c("INR", "USD", "JPY", "CNY"), COLLECTION_AMT = c(305000,
15000, 85000, 1800000), MMYYYY = structure(c(2014.16666666667,
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"),
LODG_DATE = structure(c(16130, 16374, 16475, 16521), class = "Date")),
row.names = c(NA, -4L), class = "data.frame")
Data frame B - Exchange Rates
MMYYYY Date CNY INR JPY USD
1 Mar 2014 2014-03-31 4.9444 47.726 82.0845 0.7951654
2 Oct 2014 2014-10-31 4.7552 47.749 87.2604 0.7778469
3 Feb 2015 2015-02-27 4.5990 45.222 87.7690 0.7338372
4 Mar 2015 2015-03-31 4.5179 45.383 87.5395 0.7287036
structure(list(MMYYYY = structure(c(2014.16666666667,
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"),
Date = structure(c(16160, 16374, 16493, 16525), class = "Date"), CNY =
c(4.9444, 4.7552, 4.599, 4.5179), INR = c(47.726, 47.749, 45.222, 45.383),
JPY = c(82.0845, 87.2604, 87.769, 87.5395), USD = c(0.795165394, 0.77784692,
0.733837235, 0.728703636)), .Names = c("MMYYYY", "Date", "CNY", "INR", "JPY",
"USD"), class = "data.frame", row.names = c(NA, -4L))
What I would like to do is to create a new column in data frame A possibly named Exchange Rate
. And I would like to get this exchange rate value by looking up to data frame B, by matching the COLLECTION_CRNCY
and MMYYYY
in data frame A to data frame B. I.e:
TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate
1 0001 INR 305000 Mar 2014 2014-03-01 47.7260000
2 0002 USD 15000 Oct 2014 2014-10-31 0.7778469
3 0003 JPY 85000 Feb 2015 2015-02-09 87.7690000
4 0004 CNY 1800000 Mar 2015 2015-03-27 4.5179000
I can easily do this via Excel using vlookup and match, but I would like to know how I can go about achieving the same results using R as my transactions details file is quite huge.
Transforming from wide to long format can be done with
reshape()
. It may be the most annoying function in R, but you can usually get where you want to go if you play around with its options for long enough. Once you have B in long format, a simple call tomerge()
gets the desired output.One more way for reference purposes:
You could solve this problem using base
R
by usingapply
andmerge
.To break down the problem,
1
To merge the data simply use:
merge(dfa, dfb, by="MMYYYY")
2
To extract the relevant field, we can use the apply function, in a rowwise manner.
apply(df, 1, function(x) ...)
where
df
is the data.frame,1
signals rowwise.Putting it all together, we can extract the exchange rate in one line like this:
dfa$exchange.rate <- apply(df, 1, function(x) x[x[['COLLECTION_CRNCY']]])
What the line
x[x[['COLLECTION_CRNCY']]]
is doing is simply looking up the columnCOLLECTION_CRNCY
and then using that value to query the appropriate currency column.final code:
Here's a possible
data.table
approach. Basically what you need to do is to convertdf2
to a long format and then just a simple (binary) left join todf1
Alternatively, you can do a similar thing using "Hadleyverse" but
dplyr
won't able to merge onzoo
class columns (for now), so you'll need to unclass them first