I use R to generate a toy set
data.frame(name = c("Tom", "Shane", "Daniel", "Akira", "Jack", "Zoe"), c1 = c(1,2,3,0,5,0), c2 = c(0, 3, 5, 0,4,0), c3 = c(0, 0,1,0,0,3), c4=c(0,0,0,1,0,0))
which is displayed below:
I only care about the columns c1, c2, c3, c4
, and if a specific row has more than one value, which is greater than 0. we need to duplicate rows to make sure that there are only one value, which is greater than 0, and then remove the original row.
For instance, the second row has two values are greater than 0 (c1: 2, c2: 3), then we have to duplicate that row to two, which looks like this
Shane 2 0 0 0
Shane 0 3 0 0
I am trying to build a SQL query to capture this. However, I am not sure if there is any SQL function can detect multiple non-zero values in a specific row without looking at the result first. Anyway the final result should look like this, if there any magical SQL functions exist:
I also think about to use R to accomplish it. The only R function I know can duplicate rows is do.call()
function, then combine it with rbind()
function. However, it is not working for my case. Could you someone give me any hints? Many Thanks :)
One more option using
union all
.You can do this with a few
tidyverse
functions. First, we enter your sample dataThen we gather, filter, and spread to get the rows you want. By adding in a row id, we keep the different values on different rows.
Perhaps another option using a
CROSS APPLY
Example
Returns
Consider base R with
by
that builds a zero padded dataframe for each distinct name then row binds all dataframes into final one, similar to union SQL: