R help converting non numeric column to numeric

2019-09-25 12:32发布

I'm trying to help my friend, Director of Sales, make sense of his logged call data. There is one column in particular in which he is interested, "Disposition". This column has string values and I'm trying to convert them to numeric values (i.e. "Not Answered" converted to 1, "Answered" converted to 2, etc.) and remove any row with no values entered. I've created data frames, used as.numeric, created and deleted columns/rows, etc. to no avail. I'm just trying to run simple R code to give him some insight. Any and all help is much appreciated. Thanks in advance!

P.S. I'm unsure as to whether I should provide some code due to the fact that there is a lot of delicate information (personal phone numbers and emails).

2条回答
我想做一个坏孩纸
2楼-- · 2019-09-25 12:54

First off: You should always provide representative sample data; if your data is sensitive in nature, provide mock-up data.

That aside, to recode a character vector as numeric you could convert to factor and then use as.numeric. For example:

# Sample data
column <- c("Not Answered", "Answered", "Something else", "Others")

# Convert character vector to factor
column <- factor(column, levels = as.character(unique(column)))

# Convert to numeric
as.numeric(column);
#[1] 1 2 3 4

The numbering can be adjusted by changing the order of the factor levels.

查看更多
Viruses.
3楼-- · 2019-09-25 12:56

Alternatively, you can create a new column and fill it with the numeric values using an ifelse statement. To illustrate, let's assume this is your dataframe:

df <- data.frame(
  Disposition = c(rep(c("answer", "no answer", "whatever", NA),3)),
  Anything = c(rnorm(12))
)
df

   Disposition    Anything
1       answer  2.54721951
2    no answer  1.07409803
3     whatever  0.60482744
4         <NA>  2.08405038
5       answer  0.31799860
6    no answer -1.17558239
7     whatever  0.94206106
8         <NA>  0.45355501
9       answer  0.01787330
10   no answer -0.07629330
11    whatever  0.83109679
12        <NA> -0.06937357

Now you define a new column, say df$Analysis, and assign to it numbers based on the information in df$Disposition:

df$Analysis <- ifelse(df$Disposition=="no answer", 1,
                      ifelse(df$Disposition=="answer", 2, 3))
df

      Disposition    Anything Analysis
1       answer  2.54721951        2
2    no answer  1.07409803        1
3     whatever  0.60482744        3
4         <NA>  2.08405038       NA
5       answer  0.31799860        2
6    no answer -1.17558239        1
7     whatever  0.94206106        3
8         <NA>  0.45355501       NA
9       answer  0.01787330        2
10   no answer -0.07629330        1
11    whatever  0.83109679        3
12        <NA> -0.06937357       NA

The advantage of this method is that you keep the original information unchanged. If you now want to remove Na values in the dataframe, use na.omit. NB: this will remove not only the NA values in df$Disposition but any row with NA in any column:

df_clean <- na.omit(df)
df_clean

   Disposition    Anything Analysis
1       answer  2.5472195        2
2    no answer  1.0740980        1
3     whatever  0.6048274        3
5       answer  0.3179986        2
6    no answer -1.1755824        1
7     whatever  0.9420611        3
9       answer  0.0178733        2
10   no answer -0.0762933        1
11    whatever  0.8310968        3
查看更多
登录 后发表回答