I have a data frame. Let's call him bob
:
> head(bob)
phenotype exclusion
GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
I'd like to concatenate the rows of this data frame (this will be another question). But look:
> class(bob$phenotype)
[1] "factor"
Bob
's columns are factors. So, for example:
> as.character(head(bob))
[1] "c(3, 3, 3, 6, 6, 6)" "c(3, 3, 3, 3, 3, 3)"
[3] "c(29, 29, 29, 30, 30, 30)"
I don't begin to understand this, but I guess these are indices into the levels of the factors of the columns (of the court of king caractacus) of bob
? Not what I need.
Strangely I can go through the columns of bob
by hand, and do
bob$phenotype <- as.character(bob$phenotype)
which works fine. And, after some typing, I can get a data.frame whose columns are characters rather than factors. So my question is: how can I do this automatically? How do I convert a data.frame with factor columns into a data.frame with character columns without having to manually go through each column?
Bonus question: why does the manual approach work?
I typically make this function apart of all my projects. Quick and easy.
Update: Here's an example of something that doesn't work. I thought it would, but I think that the stringsAsFactors option only works on character strings - it leaves the factors alone.
Try this:
Generally speaking, whenever you're having problems with factors that should be characters, there's a
stringsAsFactors
setting somewhere to help you (including a global setting).Or you can try
transform
:Just be sure to put every factor you'd like to convert to character.
Or you can do something like this and kill all the pests with one blow:
It's not good idea to shove the data in code like this, I could do the
sapply
part separately (actually, it's much easier to do it like that), but you get the point... I haven't checked the code, 'cause I'm not at home, so I hope it works! =)This approach, however, has a downside... you must reorganize columns afterwards, while with
transform
you can do whatever you like, but at cost of "pedestrian-style-code-writting"...So there... =)
Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:
This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.
As @hadley points out, the following is more concise.
In both cases,
lapply
outputs a list; however, owing to the magical properties of R, the use of[]
in the second case keeps the data.frame class of thebob
object, thereby eliminating the need to convert back to a data.frame usingas.data.frame
with the argumentstringsAsFactors = FALSE
.This function does the trick
If you would use
data.table
package for the operations on data.frame then the problem is not present.If you have a factor columns in you dataset already and you want to convert them to character you can do the following.