Why is it not advisable to use attach() in R, and

2019-01-01 13:40发布

问题:

Let\'s assume that we have a data frame x which contains the columns job and income. Referring to the data in the frame normally requires the commands x$jobfor the data in the job column and x$income for the data in the income column.

However, using the command attach(x) permits to do away with the name of the data frame and the $ symbol when referring to the same data. Consequently, x$job becomes job and x$income becomes income in the R code.

The problem is that many experts in R advise NOT to use the attach() command when coding in R.

What is the main reason for that? What should be used instead?

回答1:

When to use it:

I use attach() when I want the environment you get in most stats packages (eg Stata, SPSS) of working with one rectangular dataset at a time.

When not to use it:

However, it gets very messy and code quickly becomes unreadable when you have several different datasets, particularly if you are in effect using R as a crude relational database, where different rectangles of data, all relevant to the problem at hand and perhaps being used in various ways of matching data from the different rectangles, have variables with the same name.

The with() function, or the data= argument to many functions, are excellent alternatives to many instances where attach() is tempting.



回答2:

Another reason not to use attach: it allows access to the values of columns of a data frame for reading (access) only, and as they were when attached. It is not a shorthand for the current value of that column. Two examples:

> head(cars)
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
6     9   10
> attach(cars)
> # convert stopping distance to meters
> dist <- 0.3048 * dist
> # convert speed to meters per second
> speed <- 0.44707 * speed
> # compute a meaningless time
> time <- dist / speed
> # check our work
> head(cars)
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
6     9   10

No changes were made to the cars data set even though dist and speed were assigned to.

If explicitly assigned back to the data set...

> head(cars)
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
6     9   10
> attach(cars)
> # convert stopping distance to meters
> cars$dist <- 0.3048 * dist
> # convert speed to meters per second
> cars$speed <- 0.44707 * speed
> # compute a meaningless time
> cars$time <- dist / speed
> # compute meaningless time being explicit about using values in cars
> cars$time2 <- cars$dist / cars$speed
> # check our work
> head(cars)
    speed   dist      time     time2
1 1.78828 0.6096 0.5000000 0.3408862
2 1.78828 3.0480 2.5000000 1.7044311
3 3.12949 1.2192 0.5714286 0.3895842
4 3.12949 6.7056 3.1428571 2.1427133
5 3.57656 4.8768 2.0000000 1.3635449
6 4.02363 3.0480 1.1111111 0.7575249

the dist and speed that are referenced in computing time are the original (untransformed) values; the values of cars$dist and cars$speed when cars was attached.



回答3:

I think there\'s nothing wrong with using attach. I myself don\'t use it (then again, I love animals, but don\'t keep any, either). When I think of attach, I think long term. Sure, when I\'m working with a script I know it inside and out. But in one week\'s time, a month or a year when I go back to the script, I find the overheads with searching where a certain variable is from, just too expensive. A lot of methods have the data argument which makes calling variables pretty easy (sensulm(x ~ y + z, data = mydata)). If not, I find the usage of with to my satisfaction.

In short, in my book, attach is fine for short quick data exploration, but for developing scripts that I or other might want to use, I try to keep my code as readable (and transferable) as possible.



回答4:

If you execute attach(data) multiple time, eg, 5 times, then you can see (with the help of search()) that your data has been attached 5 times in the workspace environment. So if you de-attach (detach(data)) it once, there\'ll still be data present 4 times in the environment. Hence, with()/within() are better options. They help create a local environment containing that object and you can use it without creating any confusions.