Postgres where clause over two columns

2019-08-27 17:57发布

Database - I am working on in Postgres 9.6.5

I am analyzing the data from US Airport Authority (RITA) about the flights arrival and departures. This link (http://stat-computing.org/dataexpo/2009/the-data.html) lists all the columns in the table.

The table has following 29 columns

No Name Description

1 Year 1987-2008

2 Month 1-12

3 DayofMonth 1-31

4 DayOfWeek 1 (Monday) - 7 (Sunday)

5 DepTime actual departure time (local, hhmm)

6 CRSDepTime scheduled departure time (local, hhmm)

7 ArrTime actual arrival time (local, hhmm)

8 CRSArrTime scheduled arrival time (local, hhmm)

9 UniqueCarrier unique carrier code

10 FlightNum flight number

11 TailNum plane tail number

12 ActualElapsedTime in minutes

13 CRSElapsedTime in minutes

14 AirTime in minutes

15 ArrDelay arrival delay, in minutes

16 DepDelay departure delay, in minutes

17 Origin origin IATA airport code

18 Dest destination IATA airport code

19 Distance in miles

20 TaxiIn taxi in time, in minutes

21 TaxiOut taxi out time in minutes

22 Cancelled was the flight cancelled?

23 CancellationCode reason for cancellation (A = carrier, B = weather, C = NAS, D = security)

24 Diverted 1 = yes, 0 = no

25 CarrierDelay in minutes

26 WeatherDelay in minutes

27 NASDelay in minutes

28 SecurityDelay in minutes

29 LateAircraftDelay in minutes

There are about a million rows for each year.

I am trying to find out a count the most busy airports when delay is more than 15minutes. column DepDelay - has the delay time. origin - is the origin code for the airport.

All the data has been loaded into a table called 'ontime'

I am forming the query as follows in stages.

  1. select airports where delay is more than 15 minutes

    select origin,year,count(*) as depdelay_count from ontime where depdelay > 15
    group by year,origin order by depdelay_count desc )

  2. Now I wish to pull out only the top 10 airports per year - which I am doing as follows

    select x.origin,x.year from (with subquery as ( select origin,year,count(*) as depdelay_count from ontime where depdelay > 15 group by year,origin order by depdelay_count desc ) select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 10;

  3. Now that I have the top 10 airports by depdelay - I wish to get a count of the total flights out of these airports.

    select origin,count() from ontime where origin in (select x.origin from (with subquery as ( select origin,year,count() as depdelay_count from ontime where depdelay > 15 group by year,origin order by depdelay_count desc ) select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 2) group by origin order by origin;

If I modify the Step 3 query by adding the year in the year clause

---- will be any value from (1987 to 2008)

select origin,count(*) from ontime where year = (<YEAR>) origin in  
(select x.origin from (with subquery as (
    select origin,year,count(*) as depdelay_count from ontime 
    where 
    depdelay > 15
    group by year,origin 
    order by depdelay_count desc 
    )
    select origin,year,rank() over (partition by year order by depdelay_count desc) as rank from subquery) x where x.rank <= 2)
    group by origin
    order by origin;

But I have to do this manually for all years from 1987 to 2008 which I want to avoid.

Please can you help refine the query so that I can select the data for all the years without having to select each year manually.

1条回答
Anthone
2楼-- · 2019-08-27 18:38

I find CTEs int he middle of queries to e confusing. You can basically do this with one CTE/subquery:

with oy as (
      select origin, year, count(*) as numflights,
             sum( (depdelay > 15)::int ) as depdelay_count,
             row_number() over (partition by year order by sum( (depdelay > 15)::int ) desc) as seqnum
      from ontime
      group by origin, year
     ) 
select oy.*
from oy
where seqnum <= 10;

Note the use of conditional aggregation and using window functions with aggregation functions.

查看更多
登录 后发表回答