可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I would like to reformat a factor vector so the figures that it contains have a thousand separator. The vector contains integer and real number without any particular rule with respect to the values or order.
Data
In particular, I'm working with a vector vec
similar to the one generated below:
content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100",
"150.22 - 170.33",
"1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000",
"7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",
"7000 - 10000", "1000000 - 22000000", "1000000 - 22000000",
"1000000 - 22000000",
"44000000 - 66000000.8989898989")
vec <- factor(x = content, levels = unique(content))
Desired results
My ambition is to reformat this vector so the figures contain the Excel-like 1,000 separataor, as in the example below:
100.00
1,000.00
1,000,000.00
1,000,000.56
24,564,000,000.56
Tried approach
I was thinking of making use of the gsubfn
and a proto object that would pass the digit. Then maybe createing another proto object with 3 digits and replacing. As suggested in the code below:
gsubfn(pattern = "[0-9][0-9][0-9]", replacement = ~paste0(x, ','),
x = as.character(vec))
This works only partuially as comma is insterted in:
"150,.22 - 170,.33"
which obviously is wrong. I also had to convert the character vector to factor. Consquently, my question boils down to two elements:
- How can I work around the comma issue?
- How can I maintain the original structure of the factor? - I need to have a factor vector ordered in the same manner as the original one but with commas in right places.
回答1:
Operating only on the levels
seem to keep your precision level, not converting your vector to character
vector and much more efficient as it is reducing the size of the data you operate on only to the unique values (rather the whole vector)
levels(vec) <- sapply(strsplit(levels(vec), " - "),
function(x) paste(prettyNum(x,
big.mark = ",",
preserve.width = "none"),
collapse = " - "))
vec
# [1] 0 - 100 0 - 100 0 - 100 0 - 100 150.22 - 170.33
# [6] 1,000 - 2,000 1,000 - 2,000 1,000 - 2,000 1,000 - 2,000 7,000 - 10,000
# [11] 7,000 - 10,000 7,000 - 10,000 7,000 - 10,000 7,000 - 10,000 1,000,000 - 22,000,000
# [16] 1,000,000 - 22,000,000 1,000,000 - 22,000,000 44,000,000 - 66,000,000.8989898989
# Levels: 0 - 100 150.22 - 170.33 1,000 - 2,000 7,000 - 10,000 1,000,000 - 22,000,000 44,000,000 - 66,000,000.8989898989
回答2:
Use positive lookahead based regex...
content <- c("0 - 100", "0 - 100", "0 - 100", "0 - 100",
"1000 - 2000","1000 - 2000", "1000 - 2000", "1000 - 2000",
"7000 - 10000", "7000 - 10000", "7000 - 10000", "7000 - 10000",
"7000 - 10000", "1000000 - 22000000", "1000000 - 22000000",
"1000000 - 22000000")
gsub("(\\d)(?=(?:\\d{3})+\\b)", "\\1,", content, perl=T)
# [1] "0 - 100" "0 - 100" "0 - 100"
# [4] "0 - 100" "1,000 - 2,000" "1,000 - 2,000"
# [7] "1,000 - 2,000" "1,000 - 2,000" "7,000 - 10,000"
# [10] "7,000 - 10,000" "7,000 - 10,000" "7,000 - 10,000"
# [13] "7,000 - 10,000" "1,000,000 - 22,000,000" "1,000,000 - 22,000,000"
# [16] "1,000,000 - 22,000,000"
回答3:
Maybe you can use formatC
:
sapply(
X = lapply(
X = strsplit(x = content, split = " - "),
FUN = function(x) {
formatC(x = as.numeric(x), format = "f", flag = "#", big.mark = ",",
decimal.mark = ".", digits = 2, drop0trailing = FALSE)
}
),
FUN = paste, collapse = " - "
)
# [1] "0.00 - 100.00" "0.00 - 100.00" "0.00 - 100.00"
# [4] "0.00 - 100.00" "150.22 - 170.33" "1,000.00 - 2,000.00"
# [7] "1,000.00 - 2,000.00" "1,000.00 - 2,000.00" "1,000.00 - 2,000.00"
# [10] "7,000.00 - 10,000.00" "7,000.00 - 10,000.00" "7,000.00 - 10,000.00"
# [13] "7,000.00 - 10,000.00" "7,000.00 - 10,000.00" "1,000,000.00 - 22,000,000.00"
# [16] "1,000,000.00 - 22,000,000.00" "1,000,000.00 - 22,000,000.00" "44,000,000.00 - 66,000,000.90"