I am working on a regression script. I have a data.frame with roughly 130 columns, of which I need to do a regression for one column (lets call it X column) against all the other ~100 numeric columns.
Before the regression is calculated, I need to group the data by 4 factors: myDat$Recipe
, myDat$Step
, myDat$Stage
, and myDat$Prod
while still keeping the other ~100 columns and row data attached for the regression. Then I need to do a regression of each column ~ X column and print out the R^2 value with the column name. This is what I've tried so far but it is getting overly complicated and I know there's got to be a better way.
rm(list=ls())
myDat <- read.csv(file="C:/Users/Documents/myDat.csv", header=TRUE, sep=",")
for(j in myDat$Recipe)
{
myDatj <- subset(myDat, myDat$Recipe == j)
for(k in myDatj$Step)
{
myDatk <- subset(myDatj, myDatj$Step == k)
for(i in myDatk$Stage)
{
myDati <- subset(myDatk, myDatk$Stage == i)
for(m in myDati$Prod)
{
myDatm <- subset(myDati, myDati$Prod == m)
if(is.numeric(myDatm[3,i]))
{
fit <- lm(myDatk[,i] ~ X, data=myDatm)
rsq <- summary(fit)$r.squared
{
writeLines(paste(rsq,i,"\n"))
}
}
}
}
}
}