How to do aggregate functions for multiple columns

2019-08-03 01:11发布

I have 2 nodes created in graph database, origin airport and destination airport. It is related by a property named 'delayed_by'.

   MATCH (origin:origin_airport {name: row.ORIGIN}), 
   (destination:dest_airport {name: row.DEST})
   CREATE (origin)-[:delayed_by {carr_delay: row.carr_delay}]->(destination)
   CREATE (origin)-[:delayed_by {weather_delay: row.weather_delay}]->    
   (destination)
   CREATE (origin)-[:delayed_by {nas_delay: row.nas_delay}]->(destination)

delayed_by holds the value delays caused due to carrier delay, weather delay and nas delay. Here I need to group_by origin and destination and calculate the average value of sum of all 3 delays using cypher query in Neo4j. Representing in tabular format, my input file will of the format as described below:

  ORIGIN    DEST    carr_delay  weather_delay   nas_delay
   ABE        ATL      492         56             56    
   ABE        DTW      412         0              47    
   ABQ        ATL      181         0              218   

I am expecting the result in the below format.

  ORIGIN    DEST     Avg_delay  
   ABE        ATL      201.33          
   ABE        DTW      153         
   ABQ        ATL      133         

I am using the below query:

     MATCH (oa:origin_airport)-[d:delayed_by]->(da:dest_airport)
     RETURN oa.name  AS Origin, da.name AS Destination,            
     AVG((toFloat(d.carr_delay))+(toFloat(d.weather_delay))+ 
     (toFloat(d.nas_delay))) As avg_delay
      ORDER BY avg_delay DESC
      LIMIT 10

But getting null values for Avg_delay.

    ORIGIN  DEST     Avg_delay  
     ABE     ATL       NULL
     ABE     DTW       NULL
     ABQ     ATL       NULL

标签: neo4j cypher
2条回答
成全新的幸福
2楼-- · 2019-08-03 01:26

1) I recommend to change the model: for the delay relationship add the property type of delay, and the delay value save in the property value.

MATCH (origin:origin_airport {name: row.ORIGIN}), 
      (destination:dest_airport {name: row.DEST})
   CREATE (origin)
          -[:delayed_by {type: "carr_delay", value: row.carr_delay}]->
          (destination)
   CREATE (origin)
          -[:delayed_by {type: "weather_delay", value: row.weather_delay}]->
          (destination)
   CREATE (origin)
          -[:delayed_by {type: "nas_delay", value: row.nas_delay}]->
          (destination)

2) And the desired query:

MATCH (oa:origin_airport)-[d:delayed_by]->(da:dest_airport)
RETURN oa.name  AS Origin, 
       da.name AS Destination,            
       AVG( toFloat(d.value) ) As avg_delay
ORDER BY avg_delay DESC
LIMIT 10
查看更多
疯言疯语
3楼-- · 2019-08-03 01:29

The problem here is that not for every relationships you have all properties( carr_delay, weather_delay, nas_delay) set.

A non existing property evaluates to NULL - which is different from 0. AVG(1+NULL) evaluate to NULL. So you need to map NULL to 0. To fix that consider using coalesce:

MATCH (oa:origin_airport)-[d:delayed_by]->(da:dest_airport)
RETURN oa.name  AS Origin, da.name AS Destination,            
AVG(
  coalesce(toFloat(d.carr_delay),0.0))+
  coalesce(toFloat(d.weather_delay),0.0))+ 
  coalesce(toFloat(d.nas_delay),0.0))
) As avg_delay
ORDER BY avg_delay DESC
LIMIT 10
查看更多
登录 后发表回答