So I have an example dataframe that hold the columns id, count and username with id and count being numbers and username being a string.
For every row of the dataframe I want to set a value of a new column called 'ratio', with ratio being defined as
count / number of rows where username == the username in this row
Example from the provided data:
In every row where the username is 'Tom' the ratio would be count/4 , because the user Tom is found four times in the data.
This is just a simplified version of my problem, a for-loop is not an option because my original dataframe has about 3.4 million rows and my previous approach where I used for-loops to iterate the unique values of e.g. 'username' to solve this problem takes forever.
dput of my dataframe:
structure(list(id = 1:20, count = c(140L, 89L, 17L, 114L, 129L,
86L, 21L, 50L, 197L, 160L, 8L, 14L, 78L, 208L, 155L, 55L, 63L,
20L, 189L, 79L), usernames = structure(c(4L, 3L, 5L, 5L, 2L,
3L, 1L, 1L, 3L, 1L, 3L, 2L, 5L, 5L, 4L, 4L, 2L, 2L, 2L, 3L), .Label = c("Jerry",
"Mark", "Phil", "Tina", "Tom"), class = "factor")), .Names = c("id",
"count", "usernames"), row.names = c(NA, 20L), class = "data.frame")
I hope I provided everything for you to understand and reproduce the problem, if something's missing don't hesitate to mention it in the comments.