This question already has an answer here:
- Formula with dynamic number of variables 5 answers
I am trying to build a regression model with lm(...). My dataset has lots of features(>50). I do not want to write my code as lm(output~feature1+feature2+feature3+...+feature70)
. I was wondering what is the short hand notation to write this code.
Could also try things like:
Assuming output is the first column feature1:feature70 are the next 70 columns.
Or
Is probably smarter as it doesn't matter where in amongst your data the columns are.
Might cause issues if there's row's removed for NA's though...
You can use
.
as described in the help page forformula
. The.
stands for "all columns not otherwise in the formula".lm(output ~ ., data = myData)
.Alternatively, construct the formula manually with
paste
. This example is from theas.formula()
help page:You can then insert this object into regression function:
lm(fmla, data = myData)
.