I have a table in netezza (based on postgresql) like below. I need to create a new table with the NULL values for name
to be replaced with value of name
for the previous non-null row.
table1
id name time value
---------------------
1 john 11:00 324
2 NULL 12:00 645
3 NULL 13:00 324
4 bane 11:00 132
5 NULL 12:00 30
6 NULL 13:00 NULL
7 NULL 14:00 -1
8 zane 11:00 152
9 NULL 12:00 60
10 NULL 13:00 NULL
output table
name time value
---------------------
john 11:00 324
john 12:00 645
john 13:00 324
bane 11:00 132
bane 12:00 30
bane 13:00 NULL
bane 14:00 -1
zane 11:00 152
zane 12:00 60
zane 13:00 NULL
notes:
Cannot alter table1 due to permission restrictions, so a new table
is the way.
Need to run this in Netezza
(preferably) or MS Access
.
Code used to create the test data in Netezza is as below.
create temp table test (
id int
,name varchar(10)
,time time
,value int
)distribute on random;
insert into test (id, name, time, value) values(1, 'joe', '10:00', 324);
insert into test (id, name, time, value) values(2, null, '11:00', 645);
insert into test (id, name, time, value) values(3, null, '12:00', 324);
insert into test (id, name, time, value) values(4, 'bane', '10:00', 132);
insert into test (id, name, time, value) values(5, null, '11:00', 30);
insert into test (id, name, time, value) values(6, null, '12:00', null);
insert into test (id, name, time, value) values(7, null, '13:00', -1);
insert into test (id, name, time, value) values(8, 'zane', '10:00', 152);
insert into test (id, name, time, value) values(9, null, '11:00', 60);
insert into test (id, name, time, value) values(10, null, '12:00', null);
Try this recursive PostgreSQL query:
WITH RECURSIVE t(id, name, time, value) AS (
SELECT id, name, time, value FROM test WHERE id = (
SELECT MIN(id) FROM test
)
UNION
SELECT test.id, COALESCE(test.name, t.name), test.time, test.value
FROM test, t WHERE test.id = (
SELECT id FROM test WHERE id > t.id ORDER BY id LIMIT 1
)
) SELECT * FROM t ORDER BY id;
Notice though that this will probably issue a SELECT
on each row. If you don't want that, you can use a view solution:
CREATE VIEW test_view AS
SELECT id, LAG(id) OVER (ORDER BY id) lag_id, name, time, value FROM test;
WITH RECURSIVE t(id, name, time, value) AS (
SELECT id, name, time, value FROM test_view WHERE lag_id IS NULL
UNION ALL
SELECT test_view.id, COALESCE(test_view.name, t.name),
test_view.time, test_view.value
FROM test_view, t WHERE test_view.lag_id = t.id
) SELECT * FROM t ORDER BY id;
This should be much faster. The idea is from this article. SQLFiddle: http://sqlfiddle.com/#!15/63f7b/1/1.
This works for me in Access 2010:
SELECT
t1.id,
(
SELECT TOP 1 t2.name
FROM test t2
WHERE t2.id<=t1.id AND t2.name IS NOT NULL
ORDER BY t2.id DESC
) AS name,
t1.time,
t1.value
FROM test t1
It should work in other SQL dialects as well, although they may have a slightly different way of doing TOP 1
(e.g., LIMIT 1
is a common variant).
You can achieve this with COALESCE
function and subqueries:
SELECT t.id,
COALESCE(t.name, (SELECT s.name FROM table s WHERE s.name IS NOT NULL AND s.id < t.id ORDER BY s.id LIMIT 1)) AS name,
t.time,
COALESCE(t.value, (SELECT s.value FROM table s WHERE s.value IS NOT NULL AND s.id < t.id ORDER BY s.id LIMIT 1)) AS value
FROM table t ORDER BY t.id
MS Access based solution
SELECT (SELECT last(name)
FROM test AS temp
WHERE test.id >= temp.id AND temp.name IS NOT NULL) AS new_name, *
FROM test;
Try using the lead function. Not sure if that works with postgre, for Oracle it does. I think that can help.