I have a data table with several social media users and his/her followers. The original data table has the following format:
X.USERID FOLLOWERS
1081 4053807021,2476584389,4713715543, ...
So each row contains a user together with his/her ID and a vector of followers (seperated by a comma). In total I have 24,000 unique user IDs together with 160,000,000 unique followers. I wish to convert my original table in the following format:
X.USERID FOLLOWERS
1: 1081 4053807021
2: 1081 2476584389
3: 1081 4713715543
4: 1081 580410695
5: 1081 4827723557
6: 1081 704326016165142528
In order to get this data table I used the following line of code (assume that my original data table is called dt):
uf <- dt[,list(FOLLOWERS = unlist(strsplit(x = FOLLOWERS, split= ','))), by = X.USERID]
However when I run this code on the entire dataset I get the following error:
negative length vectors are not allowed
According to this post on stack overflow (Negative number of rows in data.table after incorrect use of set ), it seems that I am bumping into the memory limits of the column in data.table. As a workaround, I ran the code in smaller blocks (per 10,000) and this seemed to work.
My question is: if I change my code can I prevent this error from occuring or am I bumping into the limits of R?
PS. I have a machine with 140gb RAM at my disposal, so physical memory space should not be the issue.
> memory.limit()
[1] 147446