Reformarring complex factor vector with comma sepa

2020-04-21 07:30发布

问题:

I would like to reformat a factor vector so the figures that it contains have a thousand separator. The vector contains integer and real number without any particular rule with respect to the values or order.

Data

In particular, I'm working with a vector vec similar to the one generated below:

content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100",
             "150.22 - 170.33",
             "1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000", 
             "7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",
             "7000 - 10000", "1000000 - 22000000", "1000000 - 22000000", 
             "1000000 - 22000000",
             "44000000 - 66000000.8989898989")

vec <- factor(x = content, levels = unique(content))

Desired results

My ambition is to reformat this vector so the figures contain the Excel-like 1,000 separataor, as in the example below:

100.00 1,000.00
1,000,000.00
1,000,000.56
24,564,000,000.56


Tried approach

I was thinking of making use of the gsubfn and a proto object that would pass the digit. Then maybe createing another proto object with 3 digits and replacing. As suggested in the code below:

gsubfn(pattern = "[0-9][0-9][0-9]", replacement = ~paste0(x, ','), 
       x = as.character(vec))

This works only partuially as comma is insterted in:

"150,.22 - 170,.33"

which obviously is wrong. I also had to convert the character vector to factor. Consquently, my question boils down to two elements:

  • How can I work around the comma issue?
  • How can I maintain the original structure of the factor? - I need to have a factor vector ordered in the same manner as the original one but with commas in right places.

回答1:

Operating only on the levels seem to keep your precision level, not converting your vector to character vector and much more efficient as it is reducing the size of the data you operate on only to the unique values (rather the whole vector)

levels(vec) <- sapply(strsplit(levels(vec), " - "), 
                       function(x) paste(prettyNum(x, 
                                            big.mark = ",", 
                                            preserve.width = "none"), 
                                   collapse = " - "))
vec
#  [1] 0 - 100                            0 - 100                            0 - 100                            0 - 100                            150.22 - 170.33                   
#  [6] 1,000 - 2,000                      1,000 - 2,000                      1,000 - 2,000                      1,000 - 2,000                      7,000 - 10,000                    
# [11] 7,000 - 10,000                     7,000 - 10,000                     7,000 - 10,000                     7,000 - 10,000                     1,000,000 - 22,000,000            
# [16] 1,000,000 - 22,000,000             1,000,000 - 22,000,000             44,000,000 - 66,000,000.8989898989
# Levels: 0 - 100 150.22 - 170.33 1,000 - 2,000 7,000 - 10,000 1,000,000 - 22,000,000 44,000,000 - 66,000,000.8989898989 


回答2:

Use positive lookahead based regex...

content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100",
              "1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000", 
              "7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",
              "7000 - 10000", "1000000 - 22000000", "1000000 - 22000000", 
              "1000000 - 22000000")
gsub("(\\d)(?=(?:\\d{3})+\\b)", "\\1,", content, perl=T)
# [1] "0 - 100"                "0 - 100"                "0 - 100"               
# [4] "0 - 100"                "1,000 - 2,000"          "1,000 - 2,000"         
# [7] "1,000 - 2,000"          "1,000 - 2,000"          "7,000 - 10,000"        
# [10] "7,000 - 10,000"         "7,000 - 10,000"         "7,000 - 10,000"        
# [13] "7,000 - 10,000"         "1,000,000 - 22,000,000" "1,000,000 - 22,000,000"
# [16] "1,000,000 - 22,000,000"


回答3:

Maybe you can use formatC :

sapply(
  X = lapply(
    X = strsplit(x = content, split = " - "),
    FUN = function(x) {
      formatC(x = as.numeric(x), format = "f", flag = "#", big.mark = ",", 
              decimal.mark = ".", digits = 2, drop0trailing = FALSE)
    }
  ),
  FUN = paste, collapse = " - "
)
# [1] "0.00 - 100.00"                 "0.00 - 100.00"                 "0.00 - 100.00"                
# [4] "0.00 - 100.00"                 "150.22 - 170.33"               "1,000.00 - 2,000.00"          
# [7] "1,000.00 - 2,000.00"           "1,000.00 - 2,000.00"           "1,000.00 - 2,000.00"          
# [10] "7,000.00 - 10,000.00"          "7,000.00 - 10,000.00"          "7,000.00 - 10,000.00"         
# [13] "7,000.00 - 10,000.00"          "7,000.00 - 10,000.00"          "1,000,000.00 - 22,000,000.00" 
# [16] "1,000,000.00 - 22,000,000.00"  "1,000,000.00 - 22,000,000.00"  "44,000,000.00 - 66,000,000.90"