Sum for multiple date ranges in a single call?

2019-04-07 01:12发布

问题:

I have the following query:

SELECT 
   SUM("balance_transactions"."fee") AS sum_id 
   FROM "balance_transactions" 
   JOIN charges ON balance_transactions.source = charges.balance_id 
   WHERE "balance_transactions"."account_id" = 6 
      AND (balance_transactions.type = 'charge' 
      AND charges.refunded = false 
      AND charges.invoice IS NOT NULL) 
      AND ("balance_transactions"."created" BETWEEN '2013-12-20' AND '2014-01-19');

What that does is adds up all the "fees" that occurred between those two dates. Great. Works fine.

The problem is that I almost always need those fees for hundreds of date ranges at a time, which amounts to me running that same query hundreds of times. Not efficient.

But is there some way to condense this into a single query for all the date ranges?

For instance, I'd be calling SUM for a series of ranges like this:

2013-12-20 to 2014-01-19
2013-12-21 to 2014-01-20
2013-12-22 to 2014-01-21
2013-12-23 to 2014-01-22
2013-12-24 to 2014-01-23
...so on and so on

I need to output the sum of fees collected in each date range (and ultimately need that in an array).

So, any ideas on a way to handle that and reduce database transactions?

FWIW, this is on Postgres inside a Rails app.

回答1:

Assuming I understand your request correctly I think what you need is something along these lines:

SELECT "periods"."start_date", 
       "periods"."end_date", 
       SUM(CASE WHEN "balance_transactions"."created" BETWEEN "periods"."start_date" AND "periods"."end_date" THEN "balance_transactions"."fee" ELSE 0.00 END) AS period_sum
  FROM "balance_transactions" 
  JOIN charges ON balance_transactions.source = charges.balance_id 
  JOIN ( SELECT '2013-12-20'::date as start_date, '2014-01-19'::date as end_date UNION ALL
         SELECT '2013-12-21'::date as start_date, '2014-01-20'::date as end_date UNION ALL
         SELECT '2013-12-22'::date as start_date, '2014-01-21'::date as end_date UNION ALL
         SELECT '2013-12-23'::date as start_date, '2014-01-22'::date as end_date UNION ALL
         SELECT '2013-12-24'::date as start_date, '2014-01-23'::date as end_date
         ) as periods
    ON "balance_transactions"."created" BETWEEN "periods"."start_date" AND "periods"."end_date"
 WHERE "balance_transactions"."account_id" = 6 
   AND "balance_transactions"."type" = 'charge' 
   AND "charges"."refunded" = false 
   AND "charges"."invoice" IS NOT NULL
 GROUP BY "periods"."start_date", "periods"."end_date"

This should return you all the periods you're interested in in one single resultset. Since the query is 'generated' on the fly in your front-end you can add as many rows to the periods part as you want.

Edit: with some trial and error I managed to get it working [in sqlFiddle][1] and updated the syntax above accordingly.



回答2:

Well coming from a SQL Server background I would change your where clause to

...
AND (
      "balance_transactions"."created" BETWEEN '2013-12-20' AND '2014-01-19'
      OR
      "balance_transactions"."created" BETWEEN '2013-12-21' AND '2014-01-20'
      OR
      "balance_transactions"."created" BETWEEN '2013-12-23' AND '2014-01-22'
      OR
      "balance_transactions"."created" BETWEEN '2013-12-24' AND '2014-01-23'
    );

Just be sure you have a good index on those dates! :)



回答3:

Here's a untested procedure you can used.

CREATE OR REPLACE PROCEDURE sum_fees(v_start IN Date, v_end in Date) IS

BEGIN
  SELECT 
   SUM("balance_transactions"."fee") AS sum_id 
   FROM "balance_transactions" 
       JOIN charges ON balance_transactions.source = charges.balance_id 
   WHERE "balance_transactions"."account_id" = 6 
      AND (balance_transactions.type = 'charge' 
      AND charges.refunded = false 
      AND charges.invoice IS NOT NULL) 
      AND ("balance_transactions"."created" BETWEEN v_start AND v_end);
END;

Then call the procedure with your range date.



回答4:

It hasn't been mentioned yet and would not be as efficient as a single select BUT as a solution you could use a UNION and group by your own static identifier (date range group) such as:

SELECT 
   'a' AS DateRangeIdentifier,
   SUM("balance_transactions"."fee") AS sum_id 
   FROM "balance_transactions" 
   JOIN charges ON balance_transactions.source = charges.balance_id 
   WHERE "balance_transactions"."account_id" = 6 
      AND (balance_transactions.type = 'charge' 
      AND charges.refunded = false 
      AND charges.invoice IS NOT NULL) 
      AND ("balance_transactions"."created" BETWEEN '2013-01-01' AND '2013-01-31')

UNION 

SELECT 
   'b' AS DateRangeIdentifier,
   SUM("balance_transactions"."fee") AS sum_id 
   FROM "balance_transactions" 
   JOIN charges ON balance_transactions.source = charges.balance_id 
   WHERE "balance_transactions"."account_id" = 6 
      AND (balance_transactions.type = 'charge' 
      AND charges.refunded = false 
      AND charges.invoice IS NOT NULL) 
      AND ("balance_transactions"."created" BETWEEN '2013-07-01' AND '2013-07-31')

UNION 

SELECT 
   'c' AS DateRangeIdentifier,
   SUM("balance_transactions"."fee") AS sum_id 
   FROM "balance_transactions" 
   JOIN charges ON balance_transactions.source = charges.balance_id 
   WHERE "balance_transactions"."account_id" = 6 
      AND (balance_transactions.type = 'charge' 
      AND charges.refunded = false 
      AND charges.invoice IS NOT NULL) 
      AND ("balance_transactions"."created" BETWEEN '2013-12-20' AND '2014-01-19')

GROUP BY DateRangeIdentifier, sum_id

This way would at least yield a single database request and not a loop in your code.

See updated Fiddle here: http://sqlfiddle.com/#!15/9ce0f/5



回答5:

if i understand well you want to reutilize the date query. For this the part of the query that can be reutilized is the daily part. I mean:

SELECT 
   SUM("balance_transactions"."fee") AS sum_id 
   FROM "balance_transactions" 
   JOIN charges ON balance_transactions.source = charges.balance_id 
   WHERE "balance_transactions"."account_id" = 6 
      AND (balance_transactions.type = 'charge' 
      AND charges.refunded = false 
      AND charges.invoice IS NOT NULL) 
      AND ("balance_transactions"."created" = 'yyyy-mm-dd');

Assuming that your "created" field is just date and not timestamp, and if the data of past days doesn't change, you can dump this query to a table:

insert into sum_table
SELECT 
   "balance_transactions"."created" balance_created
   SUM("balance_transactions"."fee") AS balance_fee 
   FROM "balance_transactions" 
   JOIN charges ON balance_transactions.source = charges.balance_id 
   WHERE "balance_transactions"."account_id" = 6 
      AND (balance_transactions.type = 'charge' 
      AND charges.refunded = false 
      AND charges.invoice IS NOT NULL) 
   group by "balance_transactions"."created"
;

and then change your main query to:

SELECT 
   SUM(balance_fee) AS sum_id 
   FROM sum_table where balance_created between ('2013-12-20' AND '2014-01-19');

Another optimization is to eliminate the between because usually it does not uses indexes, and if you have lots of different dates it can be slow.

Better this way:

SELECT 
   SUM(balance_fee) AS sum_id 
   FROM sum_table where balance_created in ('2013-12-20', '2013-12-21', '2013-12-22' ... '2014-01-19');

But for this you have to create the SQL directly in the client application (ej. DAO)

Hope this helps.



回答6:

This will return all the month ranges' sum between a given date and today.

Tuples output - SQL Fiddle

select distinct on (s.d)
    s.d as "date",
    sum(bt.fee) over (
        partition by daterange(s.d, (s.d + interval '1 month')::date, '[]')
    ) as sum_id
from
    balance_transactions bt
    inner join
    charges on bt.source = charges.balance_id
    right join
    (
        select d::date as d
        from generate_series (
            '2013-12-20'::date,
            current_date,
            interval '1 day'
        ) s(d)
    ) s on s.d = bt.created
where
    bt.account_id = 6
    and bt.type = 'charge'
    and charges.refunded = false
    and charges.invoice is not null
order by s.d

Array output. Does not work on SQL Fiddle but works in my desktop.

select array_agg(("date", sum_id)) as arr_sum_id
from (
    select distinct on (s.d)
        s.d as "date",
        sum(bt.fee) over (
            partition by daterange(s.d, (s.d + interval '1 month')::date, '[]')
        ) as sum_id
    from
        balance_transactions bt
        inner join
        charges on bt.source = charges.balance_id
        right join
        (
            select d::date as d
            from generate_series (
                '2013-12-20'::date,
                current_date,
                interval '1 day'
            ) s(d)
        ) s on s.d = bt.created
    where
        bt.account_id = 6
        and bt.type = 'charge'
        and charges.refunded = false
        and charges.invoice is not null
    order by s.d
) s


回答7:

Try this:

create table timeframes (
    start_dt date,
    end_dt date
);

insert into timeframes values ('2013-12-20', '2014-01-19');
insert into timeframes values ('2013-12-21', '2014-01-20');
insert into timeframes values ('2013-12-22', '2014-01-21');
insert into timeframes values ('2013-12-23', '2014-01-22');
insert into timeframes values ('2013-12-24', '2014-01-23');

SELECT 
    tf.start_date, 
    tf.end_date, 
    SUM(CASE 
        WHEN t.created BETWEEN tf.start_date AND tf.end_date THEN t.fee
        ELSE 0.00 
    END) as transaction_sum
FROM 
    balance_transactions t
INNER JOIN 
    charges c
ON 
    t.source = c.balance_id 
INNER JOIN 
    timeframes tf
ON 
    t.created BETWEEN tf.start_date AND tf.end_date
WHERE 
    t.account_id = 6
AND 
    (
    t.type = 'charge' 
        AND 
    c.refunded = false 
        AND 
    c.invoice IS NOT NULL
    ) 
GROUP BY
    tf.start_date, 
    tf.end_date


回答8:

SELECT periods.start_date, 
     periods.end_date, 
     SUM(fee) AS Period_Sum
FROM "balance_transactions" 
JOIN charges ON balance_transactions.source = charges.balance_id 
JOIN
(SELECT CAST('2013-12-20' AS DATE) AS start_date, CAST('2014-01-19' AS DATE) AS end_date UNION     ALL
 SELECT  CAST('2013-12-21' AS DATE),CAST('2014-01-20'  AS DATE) UNION ALL
 SELECT  CAST('2013-12-22' AS DATE),  CAST('2014-01-21' AS DATE) UNION ALL
 SELECT CAST('2013-12-23' AS DATE),  CAST('2014-01-22' AS DATE) UNION ALL
 SELECT  CAST('2013-12-24' AS DATE), CAST('2014-01-23' AS DATE)) as periods
ON "balance_transactions"."created" BETWEEN periods.start_date AND periods.end_date
WHERE "balance_transactions"."account_id" = 6 
AND (balance_transactions.type = 'charge' 
AND charges.refunded = false 
AND charges.invoice IS NOT NULL) 
GROUP BY periods.start_date, periods.end_date

Here is link to SQL Fiddle Where I tested it: http://sqlfiddle.com/#!10/535ac/11/0



回答9:

I worked on the following code. It uses an XML file. The file or string contains the date ranges you need to sumarize. The Stored Procedure will return a table with the totals per specific range.

/*****************CREATES/POPULATES FAKE TABLES WITH A SIMILAR STRUCTURE TO THE ONE YOU ARE USING************/
DECLARE @balance_transactions TABLE(fee FLOAT,
                                    source INT,
                                    account_id INT,
                                    [type] VARCHAR(25),
                                    created DATETIME)

INSERT INTO @balance_transactions
SELECT 12.5, 1, 6, 'charge', '01/15/2012'
UNION
SELECT 70, 2, 6, 'charge', '01/16/2012'
UNION
SELECT 136.89, 3, 6, 'charge', '01/17/2012'
UNION
SELECT 29.16, 4, 6, 'charge', '01/18/2012'
UNION
SELECT 1369.54, 5, 6, 'charge', '02/21/2012'
UNION
SELECT 468.85, 6, 6, 'charge', '02/22/2012'
UNION
SELECT 65.8, 7, 6, 'charge', '02/22/2012'
UNION
SELECT 1236.87, 8, 6, 'charge', '02/22/2012'

DECLARE @charges TABLE(balance_id INT,
                       refunded BIT,
                       invoice INT)

INSERT INTO @charges
SELECT 1, 0, 7
UNION
SELECT 2, 0, 8
UNION
SELECT 3, 0, 9
UNION
SELECT 4, 0, 10
UNION
SELECT 5, 0, 11
UNION
SELECT 6, 0, 12
UNION
SELECT 7, 0, null
UNION
SELECT 8, 0, null
/*******************************************************/

/*
You can use the code below for creating an Stored Procedure.
The SP will return a table with the SUM of all those values indicating the Date Range.

spGetTotalsPerDateRange 'your xml goes here'

results:

fromDate                |toDate                     |total
2012-01-15 00:00:00.000 |2012-01-30 00:00:00.000    |248.55
2012-02-15 00:00:00.000 |2012-02-28 00:00:00.000    |3141.06
*/
SET DATEFORMAT MDY

DECLARE @XmlDocumentHandle int
DECLARE @XmlDocument nvarchar(4000)
SET @XmlDocument = 
N'<dates>
    <range>
       <fromDate>01/15/2012</fromDate>
       <toDate>01/30/2012</toDate>
    </range>
    <range>   
       <fromDate>02/15/2012</fromDate>
       <toDate>02/28/2012</toDate>
    </range>
</dates>'

EXEC sp_xml_preparedocument @XmlDocumentHandle OUTPUT, @XmlDocument

DECLARE @feeTotal TABLE(fromDate DATETIME,
                        toDate DATETIME,
                        total FLOAT)


DECLARE @fromDate DATETIME
DECLARE @toDate DATETIME
DECLARE ranges_cur CURSOR FOR

SELECT fromDate, toDate
FROM OPENXML (@XmlDocumentHandle, '/dates/range',2)
     WITH (fromDate  DATETIME,
           toDate DATETIME);

OPEN ranges_cur;
FETCH NEXT FROM ranges_cur INTO @fromDate, @toDate;
WHILE @@FETCH_STATUS = 0
   BEGIN
      INSERT INTO @feeTotal
      SELECT @fromDate, 
             @toDate,
             SUM(bt.fee)
      FROM @balance_transactions bt
      INNER JOIN @charges c ON bt.source = c.balance_id 
      WHERE bt.account_id = 6 
      AND (bt.type = 'charge' 
      AND c.refunded = 0 
      AND c.invoice IS NOT NULL)
      AND (bt.created >= @fromDate AND bt.created <= @toDate);

      FETCH NEXT FROM ranges_cur INTO @fromDate, @toDate;
   END;
CLOSE ranges_cur;
DEALLOCATE ranges_cur;

SELECT fromDate,
       toDate,
       total
FROM @feeTotal

EXEC sp_xml_removedocument @XmlDocumentHandle

GO

The parametrized query will look like the following one.

CREATE PROCEDURE spGetTotalsPerDateRange(@XmlDocument NVARCHAR(4000),
                                         @type VARCHAR(50) = 'charge',
                                         @refunded BIT = 0)
AS
BEGIN
    SET DATEFORMAT MDY

    DECLARE @XmlDocumentHandle INT
    EXEC sp_xml_preparedocument @XmlDocumentHandle OUTPUT, @XmlDocument

    DECLARE @feeTotal TABLE(fromDate DATETIME,
                            toDate DATETIME,
                            total FLOAT)


    DECLARE @fromDate DATETIME
    DECLARE @toDate DATETIME
    DECLARE ranges_cur CURSOR FOR

    SELECT fromDate, toDate
    FROM OPENXML (@XmlDocumentHandle, '/dates/range',2)
         WITH (fromDate  DATETIME,
               toDate DATETIME);

    OPEN ranges_cur;
    FETCH NEXT FROM ranges_cur INTO @fromDate, @toDate;
    WHILE @@FETCH_STATUS = 0
       BEGIN
          INSERT INTO @feeTotal
          SELECT @fromDate, 
                 @toDate,
                 SUM(bt.fee)
          FROM balance_transactions bt
          INNER JOIN charges c ON bt.source = c.balance_id 
          WHERE bt.account_id = 6 
          AND (bt.type = 'charge' 
          AND c.refunded = 0 
          AND c.invoice IS NOT NULL)
          AND (bt.created >= @fromDate AND bt.created <= @toDate);

          FETCH NEXT FROM ranges_cur INTO @fromDate, @toDate;
       END;
    CLOSE ranges_cur;
    DEALLOCATE ranges_cur;

    SELECT fromDate,
           toDate,
           total
    FROM @feeTotal

    EXEC sp_xml_removedocument @XmlDocumentHandle

END
GO

The first code was tested using fake data and it works properly. You need to make the required adjustments to types and names to the SP columns and variable types in accordance to your tables definition.

The idea of this approach is being able to report any information in the way you need it. You can also pass additional parameters via XML attributes. Check more info of OPEN XML on MSDN website

Hope it helps



回答10:

I am coming from oracle PL/ SQL back ground. In my opinion you can use below query to use it. My understanding is that if any transaction is happening on 19-NOV-2013 1.00 AM is always going to be in 19 Nov to 20 Nov bucket so We should not worry about range I have created below query. I hope so it will help you.

SELECT DISTINCT account_id,
  TRUNC(created)
  ||' to '
  || (TRUNC(created)+1) period,
  SUM(FEE) over (partition BY account_id,TRUNC(created) ) sumid
FROM balance_transactions a,
  charges b
WHERE a.source =balance_id
AND b.refunded =0
AND b.invoice IS NOT NULL
AND a.type     ='charge'
ORDER BY TRUNC(created)
  ||' to '
  || (TRUNC(created)+1);