I have multiple columns and I would like to find the percentage of a one column in the other columns are the same. For example;
ST cd variable
1 1 23432
1 1 2345
1 2 908890
1 2 350435
1 2 2343432
2 1 9999
2 1 23432
so what I'd like to do is:
if ST and cd are the same, then find the percentage of variable for that row over all with the same ST and cd. So in the end it would look like:
ST cd variable percentage
1 1 23432 90.90%
1 1 2345 9.10%
1 2 908890 25.30%
1 2 350435 9.48%
1 2 2343432 65.23%
2 1 9999 29.91%
2 1 23432 70.09%
How can I do this in R?
Thanks for all the help.
You can create your proportion format function:
prop_format <-
function (x, digits=4)
{
x <- round(x/sum(x), digits)*100
paste0(x,'%')
}
Then using ave
:
ave(dt$variable,list(dt$ST,dt$cd),FUN=prop_format)
[1] "90.9%" "9.1%" "25.23%" "9.73%" "65.05%" "29.91%" "70.09%"
library(data.table)
DT <- data.table(read.table(text = "ST cd variable
1 1 23432
1 1 2345
1 2 908890
1 2 350435
1 2 2343432
2 1 9999
2 1 23432 ", header = TRUE))
DT[, percentage := variable / sum(variable) , by = list(ST, cd)]
## ST cd variable percentage
## 1: 1 1 23432 0.90902743
## 2: 1 1 2345 0.09097257
## 3: 1 2 908890 0.25227624
## 4: 1 2 350435 0.09726856
## 5: 1 2 2343432 0.65045519
## 6: 2 1 9999 0.29909366
## 7: 2 1 23432 0.70090634
Using dplyr
:
require(dplyr)
df %>% group_by(ST, cd) %>% mutate(percentage = variable/sum(variable))
# ST cd variable percentage
#1 1 1 23432 0.90902743
#2 1 1 2345 0.09097257
#3 1 2 908890 0.25227624
#4 1 2 350435 0.09726856
#5 1 2 2343432 0.65045519
#6 2 1 9999 0.29909366
#7 2 1 23432 0.70090634
You can modify this if you want:
dd %>% group_by(ST, cd) %>% mutate(percentage = round(variable/sum(variable)*100, 2))
# ST cd variable percentage
#1 1 1 23432 90.90
#2 1 1 2345 9.10
#3 1 2 908890 25.23
#4 1 2 350435 9.73
#5 1 2 2343432 65.05
#6 2 1 9999 29.91
#7 2 1 23432 70.09