I would like to count islands along rows in a .csv. I say "islands" meaning consecutive non-blank entries on rows of the .csv. If there are three non-blank entries in a row, I would like that to be counted as 1 island. Anything less than three consecutive entries in a row counts as 1 "non-island". I would then like to write the output to a dataframe:
Name,,,,,,,,,,,,,
Michael,,,1,1,1,,,,,,,,
Peter,,,,1,1,,,,,,,,,
John,,,,,1,,,,,,,,,
Desired dataframe output:
Name,island,nonisland,
Michael,1,0,
Peter,0,1,
John,0,1,
You could use rle
like this;
output <- stack(sapply(apply(df, 1, rle), function(x) sum(x$lengths >= 3)))
names(output) <- c("island", "name")
output$nonisland <- 0
output$nonisland[output$island == 0] <- 1
# island name nonisland
#1 1 Michael 0
#2 0 Peter 1
#3 0 John 1
Here you run rle
across the rows of your data frame. Then look through and add up occurrences when you find lengths of 3 or more.
Note that this solution assumes all islands are made up of the same thing (i.e. all 1's as in your example). If that is not the case, you would need to convert all the non-empty entries to be the same thing by doing something like this: df[!is.na(df)] <- 1
before rle
will be appropriate.