Grouping records and getting standard deviation in

2020-02-07 10:48发布

问题:

I have a SQL below which is able to get the interval average of timestamp column grouped by icao_address, flight_number, flight_date. I'm trying to do the same for standard deviation and although I get a figure, it is wrong. The standard deviation that I get back is 14.06 (look at image below to see) while it should be around 1.8.

Below is what I'm using for stddev calculation.

STDDEV_POP(UNIX_SECONDS(timestamp))as standard_deviation

Below is my SQL

#standardSQL

select DATE(timestamp) as flight_date, safe_divide(timestamp_diff(max(timestamp), min(timestamp),SECOND), (COUNT(DISTINCT(timestamp)) - 1))as avg_interval_message, STDDEV_POP(UNIX_SECONDS(timestamp))as standard_deviation,  
icao_address, flight_number, min(timestamp) as firstrecord, max(timestamp) as lastrecord, count(timestamp) as target_updates
from `ais-data-analysis._analytics._aoi_table`
group by icao_address, flight_number, flight_date
having avg_interval_message is not null and flight_number is not null and icao_address = '4B8E41' 
order by flight_date, avg_interval_message ASC


The timestamp column is what I'm trying to get the standard deviation of, of the intervals between them, it's 10 records

回答1:

You can use STDDEV_POP(<FLOAT>) to calculate the standard deviation as you can see here

Description

Returns the population (biased) standard deviation of the values. The return result is between 0 and +Inf.

This function ignores any NULL inputs. If all inputs are ignored, this function returns NULL.

If this function receives a single non-NULL input, it returns 0.

Supported Input Types

FLOAT64

Optional Clauses

The clauses are applied in the following order:

OVER: Specifies a window. See Analytic Functions. This clause is currently incompatible with all other clauses within STDDEV_POP(). DISTINCT: Each distinct value of expression is aggregated only once into the result.

Return Data Type

FLOAT64

I hope it helps