When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.
f <- factor(sample(runif(5), 20, replace = TRUE))
## [1] 0.0248644019011408 0.0248644019011408 0.179684827337041
## [4] 0.0284090070053935 0.363644931698218 0.363644931698218
## [7] 0.179684827337041 0.249704354675487 0.249704354675487
## [10] 0.0248644019011408 0.249704354675487 0.0284090070053935
## [13] 0.179684827337041 0.0248644019011408 0.179684827337041
## [16] 0.363644931698218 0.249704354675487 0.363644931698218
## [19] 0.179684827337041 0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218
as.numeric(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
as.integer(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
I have to resort to paste
to get the real values:
as.numeric(paste(f))
## [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
## [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901
Is there a better way to convert a factor to numeric?
See the Warning section of
?factor
:The FAQ on R has similar advice.
Why is
as.numeric(levels(f))[f]
more efficent thanas.numeric(as.character(f))
?as.numeric(as.character(f))
is effectivelyas.numeric(levels(f)[f])
, so you are performing the conversion to numeric onlength(x)
values, rather than onnlevels(x)
values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.Some timings
Every answer in this post failed to generate results for me , NAs were getting generated.
What worked for me is this -
Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.
late to the game, accidently, I found
trimws()
can convertfactor(3:5)
toc("3","4","5")
. Then you can callas.numeric()
. That is:The most easiest way would be to use
unfactor
function from package varhandleThis example can be a quick start:
It is possible only in the case when the factor labels match the original values. I will explain it with an example.
Assume the data is vector
x
:Now I will create a factor with four labels:
1)
x
is with type double,f
is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.2) It is not possible to revert back to the original values (10, 20, 30, 40) having only
f
available. We can see thatf
holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.To revert back to the original values we have to know the values of levels used in creating the factor. In this case
c(10, 20, 30, 40)
. If we know the original levels (in correct order), we can revert back to the original values.And this will work only in case when labels have been defined for all possible values in the original data.
So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.
You can use
hablar::convert
if you have a data frame. The syntax is easy:Sample df
Solution
gives you:
Or if you want one column to be integer and one numeric:
results in: