R: convert integers in a character vector (json) t

2019-08-18 02:11发布

问题:

I actually have a data frame with 2000 rows (different days), each row contains a character ”vector” containing binary info on 30 different skills. If the skill has been used its number appear in the vector. But to simplify:
If I have a data frame with 3 observations (3 days) of 10 different skills -named "S_total":
S_total= [1,3,7,8,9,10], [5,9], [], and a variable Day= 1,2,3 I'd like to construct a dataframe with 3 rows and 12 columns
The columns being: Day,S_total,,s1,s,2,s3,s4,s5,s6,s7,s8,s9,s10 Where the numbered variables could be of the format true/false.

I have thought in the direction of as.numeric(read.csv) and then a for-loop containing cbind.
But there must be a better way ? tidy verse? I could hope for someone demonstrating: regular expression and the Map-command

回答1:

You can simply add a new column by either using dataFrame$newColumn or dataFrame[, "newColum]. Then you can use grepl to test if a skill is found in the vector dataFrame$S_total. e.g.

dataFrame[, "1"] <- grepl("1", dataFrame$S_total)

To get all different skills that occur in the dataset, you can split the character vectors into single numbers and then use unique. Then you can loop through all different skills and create one new column for each skill:

 > dataFrame <- data.frame(S_total = c(toString(c(1,3,7,8,11,20)),  toString(c(5,12)), ""),
    +                         Day = c(1,2,3),
    +                         stringsAsFactors = FALSE)
    > 
    > dataFrame
                 S_total Day
    1 1, 3, 7, 8, 11, 20   1
    2              5, 12   2
    3                      3
    > 
    > allSkill <- sort(unique(unlist(strsplit(dataFrame$S_total, ", "))))
    > for(i in allSkill){
    +   dataFrame[, i] <- grepl(i, dataFrame$S_total)
    + }
    > dataFrame
                 S_total Day     1    11    12    20     3     5     7     8
    1 1, 3, 7, 8, 11, 20   1  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
    2              5, 12   2  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
    3                      3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

If your dataset is not that large, this will do it. If you have a very large set and performance is important, you can first create empty columns and then loop through them which increases performance see.

No need to use map or any of the tidyverse packages in my opinion.



回答2:

Very cool solution, Just what I needed. I only needed to remove my brackets to get this to work. SO, imagining that my vector "S_total" had brackets, I'd have to:

S_total_nobracket <- gsub("\\[|\\]", "", S_total).

Thanks a mill, for your answer. It was just what I needed :-)



标签: json r tidyverse