How to convert a factor to integer\numeric without

2018-12-30 23:24发布

When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.

f <- factor(sample(runif(5), 20, replace = TRUE))
##  [1] 0.0248644019011408 0.0248644019011408 0.179684827337041 
##  [4] 0.0284090070053935 0.363644931698218  0.363644931698218 
##  [7] 0.179684827337041  0.249704354675487  0.249704354675487 
## [10] 0.0248644019011408 0.249704354675487  0.0284090070053935
## [13] 0.179684827337041  0.0248644019011408 0.179684827337041 
## [16] 0.363644931698218  0.249704354675487  0.363644931698218 
## [19] 0.179684827337041  0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218

as.numeric(f)
##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

as.integer(f)
##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

I have to resort to paste to get the real values:

as.numeric(paste(f))
##  [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
##  [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901

Is there a better way to convert a factor to numeric?

标签: r casting r-faq
7条回答
君临天下
2楼-- · 2018-12-30 23:42

See the Warning section of ?factor:

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

The FAQ on R has similar advice.


Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.


Some timings

library(microbenchmark)
microbenchmark(
  as.numeric(levels(f))[f],
  as.numeric(levels(f)[f]),
  as.numeric(as.character(f)),
  paste0(x),
  paste(x),
  times = 1e5
)
## Unit: microseconds
##                         expr   min    lq      mean median     uq      max neval
##     as.numeric(levels(f))[f] 3.982 5.120  6.088624  5.405  5.974 1981.418 1e+05
##     as.numeric(levels(f)[f]) 5.973 7.111  8.352032  7.396  8.250 4256.380 1e+05
##  as.numeric(as.character(f)) 6.827 8.249  9.628264  8.534  9.671 1983.694 1e+05
##                    paste0(x) 7.964 9.387 11.026351  9.956 10.810 2911.257 1e+05
##                     paste(x) 7.965 9.387 11.127308  9.956 11.093 2419.458 1e+05
查看更多
流年柔荑漫光年
3楼-- · 2018-12-30 23:48

Every answer in this post failed to generate results for me , NAs were getting generated.

y2<-factor(c("A","B","C","D","A")); 
as.numeric(levels(y2))[y2] 
[1] NA NA NA NA NA Warning message: NAs introduced by coercion

What worked for me is this -

as.integer(y2)
# [1] 1 2 3 4 1

Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.

查看更多
后来的你喜欢了谁
4楼-- · 2018-12-30 23:50

late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:

as.numeric(trimws(x_factor_var))
查看更多
十年一品温如言
5楼-- · 2018-12-30 23:58

The most easiest way would be to use unfactor function from package varhandle

unfactor(your_factor_variable)

This example can be a quick start:

x <- rep(c("a", "b", "c"), 20)
y <- rep(c(1, 1, 0), 20)

class(x)  # -> "character"
class(y)  # -> "numeric"

x <- factor(x)
y <- factor(y)

class(x)  # -> "factor"
class(y)  # -> "factor"

library(varhandle)
x <- unfactor(x)
y <- unfactor(y)

class(x)  # -> "character"
class(y)  # -> "numeric"
查看更多
裙下三千臣
6楼-- · 2018-12-30 23:59

It is possible only in the case when the factor labels match the original values. I will explain it with an example.

Assume the data is vector x:

x <- c(20, 10, 30, 20, 10, 40, 10, 40)

Now I will create a factor with four labels:

f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))

1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.

> typeof(x)
[1] "double"
> typeof(f)
[1] "integer"

2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.

> str(f)
 Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
> attributes(f)
$levels
[1] "A" "B" "C" "D"

$class
[1] "factor"

To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.

> orig_levels <- c(10, 20, 30, 40)
> x1 <- orig_levels[f]
> all.equal(x, x1)
[1] TRUE

And this will work only in case when labels have been defined for all possible values in the original data.

So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.

查看更多
姐姐魅力值爆表
7楼-- · 2018-12-31 00:01

You can use hablar::convert if you have a data frame. The syntax is easy:

Sample df

library(hablar)
library(dplyr)

df <- dplyr::tibble(a = as.factor(c("7", "3")),
                    b = as.factor(c("1.5", "6.3")))

Solution

df %>% 
  convert(num(a, b))

gives you:

# A tibble: 2 x 2
      a     b
  <dbl> <dbl>
1    7.  1.50
2    3.  6.30

Or if you want one column to be integer and one numeric:

df %>% 
  convert(int(a),
          num(b))

results in:

# A tibble: 2 x 2
      a     b
  <int> <dbl>
1     7  1.50
2     3  6.30
查看更多
登录 后发表回答