Let\'s assume that we have a data frame x
which contains the columns job
and income
. Referring to the data in the frame normally requires the commands x$job
for the data in the job
column and x$income
for the data in the income
column.
However, using the command attach(x)
permits to do away with the name of the data frame and the $
symbol when referring to the same data. Consequently, x$job
becomes job
and x$income
becomes income
in the R code.
The problem is that many experts in R advise NOT to use the attach()
command when coding in R.
What is the main reason for that? What should be used instead?
When to use it:
I use attach()
when I want the environment you get in most stats packages (eg Stata, SPSS) of working with one rectangular dataset at a time.
When not to use it:
However, it gets very messy and code quickly becomes unreadable when you have several different datasets, particularly if you are in effect using R as a crude relational database, where different rectangles of data, all relevant to the problem at hand and perhaps being used in various ways of matching data from the different rectangles, have variables with the same name.
The with()
function, or the data=
argument to many functions, are excellent alternatives to many instances where attach()
is tempting.
Another reason not to use attach
: it allows access to the values of columns of a data frame for reading (access) only, and as they were when attached. It is not a shorthand for the current value of that column. Two examples:
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
> attach(cars)
> # convert stopping distance to meters
> dist <- 0.3048 * dist
> # convert speed to meters per second
> speed <- 0.44707 * speed
> # compute a meaningless time
> time <- dist / speed
> # check our work
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
No changes were made to the cars
data set even though dist
and speed
were assigned to.
If explicitly assigned back to the data set...
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
> attach(cars)
> # convert stopping distance to meters
> cars$dist <- 0.3048 * dist
> # convert speed to meters per second
> cars$speed <- 0.44707 * speed
> # compute a meaningless time
> cars$time <- dist / speed
> # compute meaningless time being explicit about using values in cars
> cars$time2 <- cars$dist / cars$speed
> # check our work
> head(cars)
speed dist time time2
1 1.78828 0.6096 0.5000000 0.3408862
2 1.78828 3.0480 2.5000000 1.7044311
3 3.12949 1.2192 0.5714286 0.3895842
4 3.12949 6.7056 3.1428571 2.1427133
5 3.57656 4.8768 2.0000000 1.3635449
6 4.02363 3.0480 1.1111111 0.7575249
the dist
and speed
that are referenced in computing time
are the original (untransformed) values; the values of cars$dist
and cars$speed
when cars
was attached.
I think there\'s nothing wrong with using attach
. I myself don\'t use it (then again, I love animals, but don\'t keep any, either). When I think of attach
, I think long term. Sure, when I\'m working with a script I know it inside and out. But in one week\'s time, a month or a year when I go back to the script, I find the overheads with searching where a certain variable is from, just too expensive. A lot of methods have the data
argument which makes calling variables pretty easy (sensulm(x ~ y + z, data = mydata)
). If not, I find the usage of with
to my satisfaction.
In short, in my book, attach is fine for short quick data exploration, but for developing scripts that I or other might want to use, I try to keep my code as readable (and transferable) as possible.
If you execute attach(data)
multiple time, eg, 5 times, then you can see (with the help of search()
) that your data has been attached 5 times in the workspace environment. So if you de-attach (detach(data)
) it once, there\'ll still be data
present 4 times in the environment. Hence, with()/within()
are better options. They help create a local environment containing that object and you can use it without creating any confusions.