Using sql function generate_series() in redshift

2019-01-07 22:08发布

问题:

I'd like to use the generate series function in redshift, but have not been successful.

The redshift documentation says it's not supported. The following code does work:

select *
from generate_series(1,10,1)

outputs:

1
2
3
...
10

I'd like to do the same with dates. I've tried a number of variations, including:

select *
from generate_series(date('2008-10-01'),date('2008-10-10 00:00:00'),1)

kicks out:

 ERROR: function generate_series(date, date, integer) does not exist
 Hint: No function matches the given name and argument types.
 You may need to add explicit type casts. [SQL State=42883]

Also tried:

select *
from generate_series('2008-10-01 00:00:00'::timestamp,
'2008-10-10 00:00:00'::timestamp,'1 day')

And tried:

select *
from generate_series(cast('2008-10-01 00:00:00' as datetime),
cast('2008-10-10 00:00:00' as datetime),'1 day')

both kick out:

ERROR: function generate_series(timestamp without time zone, timestamp without time zone, "unknown") does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]

If not looks like I'll use this code from another post:

SELECT to_char(DATE '2008-01-01'
+ (interval '1 month' * generate_series(0,57)), 'YYYY-MM-DD') AS ym

PostgreSQL generate_series() with SQL function as arguments

回答1:

Amazon Redshift seems to be based on PostgreSQL 8.0.2. The timestamp arguments to generate_series() were added in 8.4.

Something like this, which sidesteps that problem, might work in Redshift.

SELECT current_date + (n || ' days')::interval
from generate_series (1, 30) n

It works in PostgreSQL 8.3, which is the earliest version I can test. It's documented in 8.0.26.

Later . . .

It seems that generate_series() is unsupported in Redshift. But given that you've verified that select * from generate_series(1,10,1) does work, the syntax above at least gives you a fighting chance. (Although the interval data type is also documented as being unsupported on Redshift.)

Still later . . .

You could also create a table of integers.

create table integers (
  n integer primary key
);

Populate it however you like. You might be able to use generate_series() locally, dump the table, and load it on Redshift. (I don't know; I don't use Redshift.)

Anyway, you can do simple date arithmetic with that table without referring directly to generate_series() or to interval data types.

select (current_date + n)
from integers
where n < 31;

That works in 8.3, at least.



回答2:

Using Redshift today, you can generate a range of dates by using datetime functions and feeding in a number table.

select (getdate()::date - generate_series)::date from generate_series(1,30,1)

Generates this for me

date
2015-11-06
2015-11-05
2015-11-04
2015-11-03
2015-11-02
2015-11-01
2015-10-31
2015-10-30
2015-10-29
2015-10-28
2015-10-27
2015-10-26
2015-10-25
2015-10-24
2015-10-23
2015-10-22
2015-10-21
2015-10-20
2015-10-19
2015-10-18
2015-10-17
2015-10-16
2015-10-15
2015-10-14
2015-10-13
2015-10-12
2015-10-11
2015-10-10
2015-10-09
2015-10-08


回答3:

The generate_series() function is not fully supported by Redshift. See the Unsupported PostgreSQL functions section of the developer guide.

UPDATE

generate_series is working with Redshift now.

SELECT CURRENT_DATE::TIMESTAMP  - (i * interval '1 day') as date_datetime 
FROM generate_series(1,31) i 
ORDER BY 1

This will generate last 30 days date

Ref: generate_series function in Amazon Redshift



回答4:

I needed to do something similar, but with 5 minutes intervals over 7 days. So here's a CTE based hack (ugly but not too verbose)

INSERT INTO five_min_periods
WITH 
periods  AS (select 0 as num UNION select 1 as num UNION select 2 UNION select 3 UNION select 4 UNION select 5 UNION select 6 UNION select 7 UNION select 8 UNION select 9 UNION select 10 UNION select 11),
hours    AS (select num from periods UNION ALL select num + 12 from periods),
days     AS (select num from periods where num <= 6),
rightnow AS (select CAST( TO_CHAR(GETDATE(), 'yyyy-mm-dd hh24') || ':' || trim(TO_CHAR((ROUND((DATEPART (MINUTE, GETDATE()) / 5), 1) * 5 ),'09')) AS TIMESTAMP) as start)
select  
  ROW_NUMBER() OVER(ORDER BY d.num DESC, h.num DESC, p.num DESC) as idx
  , DATEADD(minutes, -p.num * 5, DATEADD( hours, -h.num, DATEADD( days, -d.num, n.start ) ) ) AS period_date
from days d, hours h, periods p, rightnow n

Should be able to extend this to other generation schemes. The trick here is using the Cartesian product join (i.e. no JOIN/WHERE clause) to multiply the hand-crafted CTE's to produce the necessary increments and apply to an anchor date.