R help converting non numeric column to numeric

I'm trying to help my friend, Director of Sales, make sense of his logged call data. There is one column in particular in which he is interested, "Disposition". This column has string values and I'm trying to convert them to numeric values (i.e. "Not Answered" converted to 1, "Answered" converted to 2, etc.) and remove any row with no values entered. I've created data frames, used as.numeric, created and deleted columns/rows, etc. to no avail. I'm just trying to run simple R code to give him some insight. Any and all help is much appreciated. Thanks in advance!

P.S. I'm unsure as to whether I should provide some code due to the fact that there is a lot of delicate information (personal phone numbers and emails).

标签： r dataframe type-conversion

2条回答

我想做一个坏孩纸

2楼-- · 2019-09-25 12:54

First off: You should always provide representative sample data; if your data is sensitive in nature, provide mock-up data.

That aside, to recode a character vector as numeric you could convert to factor and then use as.numeric. For example:

# Sample data
column <- c("Not Answered", "Answered", "Something else", "Others")

# Convert character vector to factor
column <- factor(column, levels = as.character(unique(column)))

# Convert to numeric
as.numeric(column);
#[1] 1 2 3 4

The numbering can be adjusted by changing the order of the factor levels.

0人赞添加讨论(0) 举报

Viruses.

3楼-- · 2019-09-25 12:56

Alternatively, you can create a new column and fill it with the numeric values using an ifelse statement. To illustrate, let's assume this is your dataframe:

df <- data.frame(
  Disposition = c(rep(c("answer", "no answer", "whatever", NA),3)),
  Anything = c(rnorm(12))
)
df

   Disposition    Anything
1       answer  2.54721951
2    no answer  1.07409803
3     whatever  0.60482744
4         <NA>  2.08405038
5       answer  0.31799860
6    no answer -1.17558239
7     whatever  0.94206106
8         <NA>  0.45355501
9       answer  0.01787330
10   no answer -0.07629330
11    whatever  0.83109679
12        <NA> -0.06937357

Now you define a new column, say df$Analysis, and assign to it numbers based on the information in df$Disposition:

df$Analysis <- ifelse(df$Disposition=="no answer", 1,
                      ifelse(df$Disposition=="answer", 2, 3))
df

      Disposition    Anything Analysis
1       answer  2.54721951        2
2    no answer  1.07409803        1
3     whatever  0.60482744        3
4         <NA>  2.08405038       NA
5       answer  0.31799860        2
6    no answer -1.17558239        1
7     whatever  0.94206106        3
8         <NA>  0.45355501       NA
9       answer  0.01787330        2
10   no answer -0.07629330        1
11    whatever  0.83109679        3
12        <NA> -0.06937357       NA

The advantage of this method is that you keep the original information unchanged. If you now want to remove Na values in the dataframe, use na.omit. NB: this will remove not only the NA values in df$Disposition but any row with NA in any column:

df_clean <- na.omit(df)
df_clean

   Disposition    Anything Analysis
1       answer  2.5472195        2
2    no answer  1.0740980        1
3     whatever  0.6048274        3
5       answer  0.3179986        2
6    no answer -1.1755824        1
7     whatever  0.9420611        3
9       answer  0.0178733        2
10   no answer -0.0762933        1
11    whatever  0.8310968        3

0人赞添加讨论(0) 举报

R help converting non numeric column to numeric

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间