How to label groups in postgresql when group belon

2020-05-06 14:18发布

I want, in a request, to fill all Null values by the last known value. When it's in a table and not in a request, it's easy:

If I define and fill my table as follows:

CREATE TABLE test_fill_null (
    date INTEGER,
    value INTEGER
);

INSERT INTO test_fill_null VALUES 
    (1,2),
    (2, NULL), 
    (3, 45), 
    (4,NULL), 
    (5, null);

SELECT * FROM test_fill_null ;
 date | value 
------+-------
    1 |     2
    2 |      
    3 |    45
    4 |      
    5 |      

Then I just have to fill like that:

UPDATE test_fill_null t1
SET value = (
    SELECT t2.value 
    FROM test_fill_null t2 
    WHERE t2.date <= t1.date AND value IS NOT NULL 
    ORDER BY t2.date DESC 
    LIMIT 1
);

SELECT * FROM test_fill_null;
 date | value 
------+-------
    1 |     2
    2 |     2
    3 |    45
    4 |    45
    5 |    45

But now, I'm in a request, like this one:

WITH
    pre_table AS(
        SELECT 
            id1,
            id2,
            tms,
            CASE 
                WHEN tms - lag(tms) over w < interval '5 minutes' THEN NULL
                ELSE id2
            END as group_id
        FROM
            table0 
        window w as (partition by id1 order by tms)
    )

Where the group_id is set to id2 when the previous point is distant from more than 5 minutes, null otherwise. By doing so, I want to end up with group of points that follow each other by less than 5 minutes, and gaps of more than 5 minutes between each groups.

Then I don't know how to proceed. I tried:

    SELECT distinct on (id1, id2)
        t0.id1,
        t0.id2,
        t0.tms,
        t1.group_id
    FROM
        pre_table t0
        LEFT JOIN (
            select
                id1,
                tms,
                group_id
            from pre_table t2
            where t2.group_id is not null
            order by tms desc
        ) t1
        ON 
            t1.tms <= t0.tms AND
            t1.id1 = t0.id1  
    WHERE 
        t0.id1 IS NOT NULL
    ORDER BY
        id1,
        id2,
        t1.tms DESC

But in the final result I have some group with two consecutive points which are distant from more than 5 minutes. Their should be two different groups in this case.

2条回答
男人必须洒脱
2楼-- · 2020-05-06 15:09

While editing my question I found a solution. It's pretty low though, much lower than my example within a table. Any suggestion to improve it ?

    SELECT
        t2.id1,
        t2.id2,
        t2.tms,
        (
            SELECT t1.group_id 
            FROM pre_table t1 
            WHERE 
                t1.tms <= t2.tms 
                AND t1.group_id IS NOT NULL 
                AND t2.id1 = t2.id1
            ORDER BY t1.tms DESC 
            LIMIT 1
        ) as group_id
    FROM
        pre_table t2
    ORDER BY
        t2.id1
        t2.id2
        t2.tms

So as I said, a select within a select

查看更多
\"骚年 ilove
3楼-- · 2020-05-06 15:11

A "select within a select" is more commonly called "subselect" or "subquery" In your particular case it's a correlated subquery. LATERAL joins (new in postgres 9.3) can largely replace correlated subqueries with more flexible solutions:

I don't think you need either here.

For your first case this query is probably faster and simpler, though:

SELECT date, max(value) OVER (PARTITION BY grp) AS value
FROM  (
   SELECT *, count(value) OVER (ORDER BY date) AS grp
   FROM   test_fill_null
   ) sub;

count() only counts non-null values, so grp is incremented with every non-null value, thereby forming groups as desired. It's trivial to pick the one non-null value per grp in the outer SELECT.


For your second case, I'll assume the initial order of rows is determined by (id1, id2, tms) as indicated by one of your queries.

SELECT id1, id2, tms
     , count(step) OVER (ORDER BY id1, id2, tms) AS group_id
FROM  (
   SELECT *, CASE WHEN lag(tms, 1, '-infinity') OVER (PARTITION BY id1 ORDER BY id2, tms)
                       < tms - interval '5 min'
                  THEN true END AS step
   FROM   table0
   ) sub
ORDER  BY id1, id2, tms;

Adapt to your actual order. One of these might cover it:

PARTITION BY id1 ORDER BY id2  -- ignore tms
PARTITION BY id1 ORDER BY tms  -- ignore id2

SQL Fiddle with an extended example.

Related:

查看更多
登录 后发表回答