MVE: Let this be the data set:
data <- data.frame(year = rep(seq(1966,2015,1), 8),
county = c(rep('prva', 50), rep('druga', 50), rep('treća', 50), rep('četvrta', 50),
rep('peta', 50), rep('šesta', 50), rep('sedma', 50), rep('osma', 50)),
crime1 = runif(400), crime2 = runif(400), crime3 = runif(400),
uvar1 = runif(400), uvar2 = runif(400), uvar3 = runif(400),
var1 = runif(400), var2 = runif(400), var3 = runif(400), var4 = runif(400), var5 = runif(400))
Let's say crime1,2 and 3 are specific dependent variables. uvar1,2 and 3 are specific independent variables. var1,2 etc. are other covariates. What I'm trying to do is automate the regressions.
Namely, I want to get the result of this code:
plm(log(crime1) = log(univar1) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)
plm(log(crime2) = log(univar2) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)
etc.; but without writing 20 lines of code for each estimated model.
By looking at similar questions, this is as far as I'd come:
crime <- c('crime1', 'crime2', 'crime3')
plm.results <- lapply(data[, crime], function(y) plm(y ~ var1 + var2 + var3 + var4,
model = 'within', effect ='twoways', data = data))
Which certainly helps for my dependent variables, but I cannot figure how to include specific independent variables in each of these estimations. To clarify once more, I want univar1 to be in the first regression, but not in the rest of them etc.
formula
function is helpful when creating multiple sets of models. You could incorporate variations using combination ofpaste0
andformula
withlapply
to traverse the indices 1 to 3.Summary:
Explanation:
The independent variables are of two type, first
uvar1
and othersvar1...varN
.1)
colnames(regDF)[grepl("^v",colnames(regDF))]
this will give us a list of all variables in regDF which match pattern of beginning with letter 'v' with caret symbol signifying start of the string and$
as end of the string, output at this stage isc("var1","var2"...,"var5")
2) We need log variants of this variable vector hence we pass them through
lapply
to the functionfn_appendLog
, which results in the list output oflist("log(var1)","log(var2)",...,"log(var5)")
3) Next, we need these variables transformed as
log(var1)+log(var2)...+log(var5)
4) To do so, we use function
Reduce
with the functionpaste(x,y,sep="+")
, this takes each element of the above list with adjacent element and joins together with the seperator as "+"5) The function
Reduce
applies the function to the list and aggregates the output into a single vector resulting the final output oflog(var1)+log(var2)+log(var3)+log(var4)+log(var5)
This might seem intimidating at first but as you use them often and explore examples they will part of you repertoire in no time.The best way to learn about a function say
lapply
is to read the documentation of?lapply
end to end and execute listed examples, tinker with parameters and gain familiarity. Hope this sheds some light on your query.