Duplicate the rows based on some criteria in SQL o

2019-04-14 01:41发布

I use R to generate a toy set

data.frame(name = c("Tom", "Shane", "Daniel", "Akira", "Jack", "Zoe"), c1 = c(1,2,3,0,5,0), c2 = c(0, 3, 5, 0,4,0), c3 = c(0, 0,1,0,0,3), c4=c(0,0,0,1,0,0))

which is displayed below:

enter image description here

I only care about the columns c1, c2, c3, c4, and if a specific row has more than one value, which is greater than 0. we need to duplicate rows to make sure that there are only one value, which is greater than 0, and then remove the original row.

For instance, the second row has two values are greater than 0 (c1: 2, c2: 3), then we have to duplicate that row to two, which looks like this

Shane 2 0 0 0

Shane 0 3 0 0

I am trying to build a SQL query to capture this. However, I am not sure if there is any SQL function can detect multiple non-zero values in a specific row without looking at the result first. Anyway the final result should look like this, if there any magical SQL functions exist:

enter image description here

I also think about to use R to accomplish it. The only R function I know can duplicate rows is do.call() function, then combine it with rbind() function. However, it is not working for my case. Could you someone give me any hints? Many Thanks :)

5条回答
来,给爷笑一个
2楼-- · 2019-04-14 02:05

One more option using union all.

select name,c1,0 as c2,0 as c3,0 as c4 from tbl where c1>0
union all
select name,0,c2,0,0 from tbl where c2>0
union all
select name,0,0,c3,0 from tbl where c3>0
union all
select name,0,0,0,c4 from tbl where c4>0
查看更多
Animai°情兽
3楼-- · 2019-04-14 02:09
df1 = data.frame(name = c("Tom", "Shane", "Daniel", "Akira", "Jack", "Zoe"),
                 c1 = c(1,2,3,0,5,0),
                 c2 = c(0, 3, 5, 0,4,0),
                 c3 = c(0, 0,1,0,0,3),
                 c4=c(0,0,0,1,0,0))

df2 = df1[rep(1:NROW(df1), apply(df1, 1, function(x) sum(x[-(1)] > 0))),]
df3 = df2
df3[-1] = df3[-1] * 0
df3[ave(1:NROW(df2), df2$name, FUN = length) == 1,] = df2[ave(1:NROW(df2), df2$name, FUN = length) == 1,]
replace(x = df3,
        list = cbind(1:NROW(df3), 1+ave(1:NROW(df2), df2$name, FUN = seq_along)),
        values = df2[cbind(1:NROW(df3), 1+ave(1:NROW(df2), df2$name, FUN = seq_along))])
#      name c1 c2 c3 c4
#1      Tom  1  0  0  0
#2    Shane  2  0  0  0
#2.1  Shane  0  3  0  0
#3   Daniel  3  0  0  0
#3.1 Daniel  0  5  0  0
#3.2 Daniel  0  0  1  0
#4    Akira  0  0  0  1
#5     Jack  5  0  0  0
#5.1   Jack  0  4  0  0
#6      Zoe  0  0  3  0
查看更多
爷的心禁止访问
4楼-- · 2019-04-14 02:19

You can do this with a few tidyverse functions. First, we enter your sample data

library(tidyverse)
dd <- tribble(~name, ~c1, ~c2, ~c3, ~c4,
        "Tom", 1, 0, 0, 0,
        "Shane", 2, 3, 0, 0,
        "Daniel", 3, 5, 1, 0,
        "Akira", 0, 0, 0 ,1,
        "Jack", 5, 4, 0, 0,
        "Zoe", 0, 0, 3, 0)

Then we gather, filter, and spread to get the rows you want. By adding in a row id, we keep the different values on different rows.

dd %>% 
  gather("var", "val", -name) %>% 
  rowid_to_column() %>% 
  filter(val>0) %>% 
  spread(var, val, fill=0) %>% 
  select(-rowid)
# A tibble: 10 x 5
#      name    c1    c2    c3    c4
#  *  <chr> <dbl> <dbl> <dbl> <dbl>
#  1    Tom     1     0     0     0
#  2  Shane     2     0     0     0
#  3 Daniel     3     0     0     0
#  4   Jack     5     0     0     0
#  5  Shane     0     3     0     0
#  6 Daniel     0     5     0     0
#  7   Jack     0     4     0     0
#  8 Daniel     0     0     1     0
#  9    Zoe     0     0     3     0
# 10  Akira     0     0     0     1
查看更多
放荡不羁爱自由
5楼-- · 2019-04-14 02:25

Perhaps another option using a CROSS APPLY

Example

Select A.Name
      ,B.*
 From  YourTable A
 Cross Apply ( values (C1,0,0,0)
                     ,(0,C2,0,0)
                     ,(0,0,C3,0)
                     ,(0,0,0,C4)
             ) B (C1,C2,C3,C4)
 Where B.C1+B.C2+B.C3+B.C4<>0

Returns

enter image description here

查看更多
一纸荒年 Trace。
6楼-- · 2019-04-14 02:29

Consider base R with by that builds a zero padded dataframe for each distinct name then row binds all dataframes into final one, similar to union SQL:

df_list <- by(df, df$name, FUN = function(d){

  tmp <- data.frame(name = d$name[1],
             c1 = c(max(d$c1), rep(0, 3)),
             c2 = c(0, max(d$c2), rep(0, 2)),
             c3 = c(rep(0, 2), max(d$c3), 0),
             c4 = c(rep(0, 3), max(d$c4)))

  tmp <- tmp[rowSums(tmp[-1])!=0,]
  row.names(tmp) <- NULL
  tmp

})

final_df <- do.call(rbind, unname(df_list))
final_df
查看更多
登录 后发表回答