How to use Group By and self-join to return min, m

2020-03-29 04:42发布

问题:

SOLVED

All hail StackOverlow!

While I was gone, people left 2 solutions (thanks guys--what is the protocol for handing out karma for two working solutions?)

Here is the solution that I came back to post. it is derived from yet ANOTHER StackOver solution:

How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?

...and my adaptation is:

SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price,
SUBSTRING_INDEX( 
    GROUP_CONCAT(
        PRICE
        ORDER BY DTE ASC
        )
, ',', 1 ) AS opn_price,
SUBSTRING_INDEX( 
    GROUP_CONCAT(
        PRICE
        ORDER BY DTE DESC
        )
, ',', 1 ) AS cls_price 
FROM `CHART_DATA` 
GROUP BY trading_day
;

The data the "Q" above starts with is the data with which I am trying to end. Hopefully, this helps someone else since I suspect that my log is fairly common.

I am willing to bet that one of these three solutions has a performance advantage. If anyone happens to know the inner workings of MySQL and query optimization and cares to recommend the "preferred" solution, that will be useful to know in the future.

END SOLVED

UPDATE #2

Tryhing to come at it from yet another direction using this:

http://forums.mysql.com/read.php?65,363723,363723

I get:

SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price,
(SELECT opn_price FROM 
    (SELECT 
    DATE_FORMAT(`DTE`, "%m/%d/%Y") AS a_day,    
    PRICE AS opn_price,
    MIN(DTE) AS opn
    FROM `CHART_DATA` 
    GROUP BY a_day
    ORDER BY opn ASC LIMIT 1) AS tblO) AS opnqt,
(SELECT cls_price FROM 
    (SELECT 
    DATE_FORMAT(`DTE`, "%m/%d/%Y") AS a_day,    
    PRICE AS cls_price,
    MIN(DTE) AS cls
    FROM `CHART_DATA` 
    GROUP BY a_day
    ORDER BY cls DESC LIMIT 1) AS tblC) AS clsqt    
FROM `CHART_DATA` cht
GROUP BY trading_day;

This suffers from a similar dysfunction as the query in the first UPDATE below; the 'clsqt' (cls_price) value returned is the last closing price found in the data. Blech.

Plus, we are starting to get into "hideously complex" query space again and that CANNOT be good for performance.

But if anyone sees the fix for the 'clsqt' value, I would accept it gladly and solve the performance issue at a later day. :)

END UPDATE #2

UPDATE

So close...here's where I am today:

SELECT 
DATE_FORMAT(`cht1`.`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`cht1`.`PRICE`) AS min_price,
MAX(`cht1`.`PRICE`) AS max_price,
    MIN(cht1.DTE) AS opn_date1,
    MIN(cht1.DTE) AS opn_date2,
    `cht2`.`PRICE` AS opn_price,
        MAX(cht1.DTE) AS cls_date1,
        MAX(cht3.DTE) AS cls_date3,
        `cht3`.`PRICE` AS cls_price
FROM `CHART_DATA` cht1
     LEFT JOIN `CHART_DATA` cht2 
     ON cht2.DTE = cht1.DTE
     LEFT JOIN `CHART_DATA` cht3 
     ON cht3.DTE = cht1.DTE
GROUP BY trading_day
HAVING opn_date1 = opn_date2
AND cls_date1 = cls_date3
;

This retuns everything correctly BUT the correct 'cls_price' (it is returnign the same value for 'cls_price' as 'opn_price').

However, 'cls_date1' and 'cls_date3' are the correct values, so I must be close.

Anyone see what I am not?

END UPDATE

I have been poring over SO with regard to left joins and self joins...and I must admit that I am not grokking.

I found this "Q" that seems very close to what I want: left join with condition for right table in mysql

What I am trying to do is retrieve open, close, min, and max price days from a single table (sample data below).

Min and Max are easy:

SELECT 
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price
FROM `CHART_DATA` 
GROUP BY trading_day;

I want the results returned group by date, somthing like:

'trading_day' 'opn_price' 'min_price' 'max_price' 'cls_price'

Okay, so I try 'baby steps' with just one join...

SELECT 
DATE_FORMAT(`cht1`.`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`cht1`.`PRICE`) AS min_price,
MAX(`cht1`.`PRICE`) AS max_price,
`cht2`.`PRICE` AS opn_price
FROM `CHART_DATA` cht1
 LEFT JOIN `CHART_DATA` cht2 
 ON cht2.DTE = MIN(cht1.DTE)
GROUP BY trading_day;

...and I get the message "Invalid use of group function"

Of course, removing the "GROUP BY" is no help, since I need to return aggegate columns.

I have a really complex solution that gets the open and close results, but not the min and max--and they are in separate result sets. I get the feeling that I have made this more complex than is necessary and that fi I could just grasp what is going on with the self joins cited in the "Q" referenced above, that my overall coding would improvie immeasurably. But I have spent something like 12 hours on this during the past weekend and am more confusted than ever.

All insight and explantion and observation is welcome at this point...

/* SAMPLE TABLE AND DATA */

CREATE TABLE `CHART_DATA` (
  `ID` varchar(10) DEFAULT NULL,
  `DTE` datetime DEFAULT NULL,
  `PRICE` double DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

/*Data for the table `CHART_DATA` */

INSERT INTO `chart_data`
            (`id`,`dte`,`price`)
VALUES      ('1','2011-01-01 00:10:00',0.65),
            ('2','2011-01-01 06:10:00',0.92),
            ('3','2011-01-01 12:10:00',0.59),
            ('4','2011-01-01 18:10:00',0.16),
            ('5','2011-01-02 00:10:00',0.28),
            ('6','2011-01-02 06:10:00',0.12),
            ('7','2011-01-02 12:10:00',0.92),
            ('8','2011-01-02 18:10:00',0.1),
            ('9','2011-01-03 00:10:00',0.34),
            ('10','2011-01-03 06:10:00',0.79),
            ('11','2011-01-03 12:10:00',1.23),
            ('12','2011-01-03 18:10:00',1.24),
            ('13','2011-01-04 00:10:00',1.12),
            ('14','2011-01-04 06:10:00',0.8),
            ('15','2011-01-04 12:10:00',0.65),
            ('16','2011-01-04 18:10:00',0.78),
            ('17','2011-01-05 00:10:00',0.65),
            ('18','2011-01-05 06:10:00',1.19),
            ('19','2011-01-05 12:10:00',0.89),
            ('20','2011-01-05 18:10:00',1.05),
            ('21','2011-01-06 00:10:00',0.29),
            ('22','2011-01-06 06:10:00',0.43),
            ('23','2011-01-06 12:10:00',0.26),
            ('24','2011-01-06 18:10:00',0.34),
            ('25','2011-01-07 00:10:00',0.22),
            ('26','2011-01-07 06:10:00',0.37),
            ('27','2011-01-07 12:10:00',1.22),
            ('28','2011-01-07 18:10:00',1.16),
            ('29','2011-01-08 00:10:00',0.3),
            ('30','2011-01-08 06:10:00',1.17),
            ('31','2011-01-08 12:10:00',0.62),
            ('32','2011-01-08 18:10:00',0.86),
            ('33','2011-01-09 00:10:00',0.84),
            ('34','2011-01-09 06:10:00',1.11),
            ('35','2011-01-09 12:10:00',0.92),
            ('36','2011-01-09 18:10:00',1.03),
            ('37','2011-01-10 00:10:00',1.13),
            ('38','2011-01-10 06:10:00',0.58),
            ('39','2011-01-10 12:10:00',1.03),
            ('40','2011-01-10 18:10:00',0.21),
            ('41','2011-01-11 00:10:00',0.12),
            ('42','2011-01-11 06:10:00',1.01),
            ('43','2011-01-11 12:10:00',0.19),
            ('44','2011-01-11 18:10:00',1.14),
            ('45','2011-01-12 00:10:00',0.55),
            ('46','2011-01-12 06:10:00',0.75),
            ('47','2011-01-12 12:10:00',0.66),
            ('48','2011-01-12 18:10:00',1.1),
            ('49','2011-01-13 00:10:00',0.68),
            ('50','2011-01-13 06:10:00',0.3),
            ('51','2011-01-13 12:10:00',0.9),
            ('52','2011-01-13 18:10:00',0.88),
            ('53','2011-01-14 00:10:00',0.64),
            ('54','2011-01-14 06:10:00',1.06),
            ('55','2011-01-14 12:10:00',1.12),
            ('56','2011-01-14 18:10:00',0.76),
            ('57','2011-01-15 00:10:00',0.18),
            ('58','2011-01-15 06:10:00',1.08),
            ('59','2011-01-15 12:10:00',0.66),
            ('60','2011-01-15 18:10:00',0.38),
            ('61','2011-01-16 00:10:00',1),
            ('62','2011-01-16 06:10:00',1.18),
            ('63','2011-01-16 12:10:00',1.15),
            ('64','2011-01-16 18:10:00',0.58),
            ('65','2011-01-17 00:10:00',1.04),
            ('66','2011-01-17 06:10:00',0.81),
            ('67','2011-01-17 12:10:00',0.35),
            ('68','2011-01-17 18:10:00',0.91),
            ('69','2011-01-18 00:10:00',0.14),
            ('70','2011-01-18 06:10:00',0.13),
            ('71','2011-01-18 12:10:00',1.03),
            ('72','2011-01-18 18:10:00',0.16),
            ('73','2011-01-19 00:10:00',1.05),
            ('74','2011-01-19 06:10:00',1.13),
            ('75','2011-01-19 12:10:00',1.21),
            ('76','2011-01-19 18:10:00',0.34),
            ('77','2011-01-20 00:10:00',0.63),
            ('78','2011-01-20 06:10:00',0.62),
            ('79','2011-01-20 12:10:00',0.19),
            ('80','2011-01-20 18:10:00',1.21),
            ('81','2011-01-21 00:10:00',0.83),
            ('82','2011-01-21 06:10:00',0.99),
            ('83','2011-01-21 12:10:00',0.83),
            ('84','2011-01-21 18:10:00',0.21),
            ('85','2011-01-22 00:10:00',0.8),
            ('86','2011-01-22 06:10:00',0.69),
            ('87','2011-01-22 12:10:00',0.87);  

回答1:

So you want:

  • The alias C to correspond to all rows in the group for a given day, so you can use MAX() and MIN() over the rows in that group.
  • The alias C2 to correspond to the last row in a given day.
  • The alias C3 to correspond to a row later than C2 on the same day. If none is found, i.e. C3.* is NULL, then C2 is the latest on that day.

This is often labeled a greatest-n-per-group query, and it comes up frequently on Stack Overflow. Here's a solution that I tested for your test data, but you can follow the tag I added to your question for other solutions and discussion.

edit: I missed the requirement for both opening price and closing price. The following is edited.

SELECT DATE_FORMAT(C.`DTE`, '%m/%d/%Y') AS trading_day, 
  MIN(C.`PRICE`) AS min_price, 
  MAX(C.`PRICE`) AS max_price, 
  Copen.`PRICE` AS opening_price,
  Cclose.`PRICE` AS closing_price 
FROM `CHART_DATA` AS C 
INNER JOIN `CHART_DATA` AS Cclose 
  ON DAY(C.`DTE`) = DAY(Cclose.`DTE`) 
LEFT OUTER JOIN `CHART_DATA` AS Cclose_later 
  ON DAY(C.`DTE`) = DAY(Cclose_later.`DTE`) AND Cclose.`DTE` < Cclose_later.`DTE`
INNER JOIN `CHART_DATA` AS Copen 
  ON DAY(C.`DTE`) = DAY(Copen.`DTE`) 
LEFT OUTER JOIN `CHART_DATA` AS Copen_earlier 
  ON DAY(C.`DTE`) = DAY(Copen_earlier.`DTE`) AND Copen.`DTE` < Copen_earlier.`DTE`
WHERE Cclose_later.`DTE` IS NULL AND Copen_earlier .`DTE` IS NULL 
GROUP BY trading_day;


回答2:

SELECT
  a.trading_day, a.min_price, a.max_price, 
  b.price as opn_price, 
  c.price as cls_price
FROM
  (SELECT 
     DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
     MIN(`PRICE`) AS min_price,
     MAX(`PRICE`) AS max_price,
     MIN(`dte`) AS open_date,
     MAX(`dte`) AS close_date
   FROM `CHART_DATA`
   GROUP BY trading_day) a
LEFT JOIN
  `CHART_DATA` b ON b.dte = a.open_date
LEFT JOIN
  `CHART_DATA` c ON c.dte = a.close_date

Note: this solution may present some problems if your opening or closing entry has the exact same date/time value as another row (i.e. the transaction that came immediately after opening, or immediately before closing). To address this, I would suggest that you add a sequence number that is stored in a way that guarantees uniqueness, and increasing-with-respect-to-time. If you do this, then you would use the sequence value in the place of dte to replace the open_date and close_date I've used as join fields in my example.