Creating formula using very long strings in R

2019-02-24 02:42发布

I'm in a situation where I have a vector full of column names for a really large data frame.

Let's assume: x = c("Name", "address", "Gender", ......, "class" ) [approximatively 100 variables]

Now, I would like to create a formula which I'll eventually use to create a HoeffdingTree. I'm creating formula using:

myformula <- as.formula(paste("class ~ ", paste(x, collapse= "+")))

This throws up the following error:

Error in parse(text = x) : :1:360: unexpected 'else' 1:e+spread+prayforsonni+just+want+amp+argue+blxcknicotine+mood+now+right+actually+herapatra+must+simply+suck+there+always+cookies+ever+everything+getting+nice+nigga+they+times+abu+all+alliepickl

The paste part in the above statement works fine but passing it as an argument to as.formula is throwing all kinds of weird problems.

标签: r formula
3条回答
做自己的国王
2楼-- · 2019-02-24 03:35

You may try reformulate

 reformulate(setdiff(x, 'class'), response='class')
 #class ~ Name + address + Gender

where 'x' is

  x <- c("Name", "address", "Gender", 'class')

If R keywords are in the 'x', you can do

   reformulate('.', response='class')
   #class ~ .
查看更多
不美不萌又怎样
3楼-- · 2019-02-24 03:42

The problem is that you have R keywords as column names. else is a keyword so you can't use it as a regular name.

A simplified example:

s <- c("x", "else", "z")
f <- paste("y~", paste(s, collapse="+"))
formula(f)
# Error in parse(text = x) : <text>:1:10: unexpected '+'
# 1: y~ x+else+
#              ^

The solution is to wrap your words in backticks "`" so that R will treat them as non-syntactic variable names.

f <- paste("y~", paste(sprintf("`%s`", s), collapse="+"))
formula(f)
# y ~ x + `else` + z
查看更多
做个烂人
4楼-- · 2019-02-24 03:44

You can reduce your data-set first

dat_small <- dat[,c("class",x)]

and then use

myformula <- as.formula("class ~ .")

The . means using all other (all but class) column.

查看更多
登录 后发表回答