How to make multiple LEFT JOINs with OR fully use

2020-05-09 23:15发布

问题:

It is for a system that calculates how the users scan their fingerprints when they enter/leave the workplace. I don't know how it is called in English. I need to determine if the user is late in the morning, and if the user leaves work early.

This tb_scan table contains date and time a user scans a fingerprint.

CREATE TABLE `tb_scan` (
  `scpercode` varchar(6) DEFAULT NULL,
  `scyear` varchar(4) DEFAULT NULL,
  `scmonth` varchar(2) DEFAULT NULL,
  `scday` varchar(2) DEFAULT NULL,
  `scscantime` datetime,
  KEY `all` (`scyear`,`scmonth`,`scday`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

It has 100,000+ rows, something like this

scpercode scyear scmonth scday     scdateandtime
000001    2010      10     10      2016-01-10 08:02:00
000001    2010      10     10      2016-01-02 17:33:00
000001    2010      10     11      2016-01-11 07:48:00
000001    2010      10     11      2016-01-11 17:29:00
000002    2010      10     10      2016-01-10 17:31:00
000002    2010      10     10      2016-01-02 17:28:00
000002    2010      10     11      2016-01-11 05:35:00
000002    2010      10     11      2016-01-11 05:29:00

And this tb_workday table contains each date

CREATE TABLE `tb_workday` (
  `wdpercode` varchar(6) DEFAULT NULL,
  `wdshift` varchar(1) DEFAULT NULL,
  `wddate` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

It has rows with date sequence like this:

wdpercode  wdshift wddate
000001     1       2010-10-10
000001     1       2010-10-11
000001     1       2010-10-12
000001     1       2010-10-13
000002     2       2010-10-10
000002     2       2010-10-11
000002     2       2010-10-12
000002     2       2010-10-13

There is another tb_shift table containing shift time

CREATE TABLE `tb_shift` (
  `shiftcode` varchar(1) DEFAULT NULL,
  `shiftbegin2` varchar(4) DEFAULT NULL,
  `shiftbegin` varchar(4) DEFAULT NULL,
  `shiftmid` varchar(4) DEFAULT NULL,
  `shiftend` varchar(4) DEFAULT NULL,
  `shiftend2` varchar(4) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

shiftcode   shiftbegin2  shiftbegin  shiftmid  shiftend  shiftend2
        1     04:00:00     08:00:00  12:00:00  17:30:00  21:30:00 
        2     12:00:00     17:30:00  21:00:00  05:30:00  09:30:00

I want to determine that in each day, is the employee comes to work late or leaves work early, and at what time.

SELECT wdpercode,wddate,shiftbegin,shiftend,time(tlate.scscantime) wdlate,time(tearly.scscantime) wdearly
FROM tb_workday
LEFT JOIN tb_shift
  ON wdshift=shiftcode
LEFT JOIN tb_scan tlate 
  ON wdpercode=tlate.scpercode
  AND tlate.scyear=year(wddate)
  AND tlate.scmonth=month(wddate)
  AND (tlate.scday=day(wddate)
    OR tlate.scday=day(wddate)+1)
  AND tlate.scscantime>=ADDDATE(CONCAT(wddate,' ',shiftbegin),INTERVAL IF(shiftbegin2>shiftbegin,1,0) DAY)
  AND tlate.scscantime<=ADDDATE(CONCAT(wddate,' ',shiftmid),INTERVAL IF(shiftbegin2>shiftmid,1,0) DAY)
LEFT JOIN tb_scan tearly 
  ON wdpercode=tearly.scpercode
  AND tearly.scyear=year(wddate)
  AND tearly.scmonth=month(wddate)
  AND (tearly.scday=day(wddate)
    OR tearly.scday=day(wddate)+1)
  AND tearly.scscantime>ADDDATE(CONCAT(wddate,' ',shiftmid),INTERVAL IF(shiftbegin2>shiftmid,1,0) DAY)
  AND tearly.scscantime<ADDDATE(CONCAT(wddate,' ',shiftend),INTERVAL IF(shiftbegin2>shiftend,1,0) DAY)

Here is the example of an output:

wdpercode wddate      shiftbegin  shiftend  wdlate    wdearly
000001    2016-01-10  08:00:00    17:30:00  08:02:00  (null)
000001    2016-01-11  08:00:00    17:30:00  (null)    17:29:00
000002    2016-01-11  17:30:00    05:30:00  17:31:00  (null)
000002    2016-01-11  17:30:00    05:30:00  (null)    05:29:00

this ADDDATE(CONCAT(wddate,' ',shiftbegin),INTERVAL IF(shiftbegin2>shiftbegin,1,0) DAY) is for employees who work on night shift, so it has to add 1 day into the shift time

The problem is if I create an index for scscantime, MySQL refuses to use it for comparison (>=,<=,>,<). Please see this thread Why does MySQL not use an index for a greater than comparison?

Because of this I created the scyear, scmonth, and scday fields and combine them in an index along with scpercode. And I have to make sure it calculates for workers working in night shift too so I have to add it with OR scday=day(wddate)+1 condition.

Before I added the OR condition, the EXPLAIN result was 52 rows. But when I added the OR scday=day(wddate)+1 condition, the EXPLAIN result became 364 rows, that means MySQL did not use scday part of the index. Is there any way to use the whole index, so the EXPLAIN result becomes more efficient like 52 rows? I also tried removing the +1 part and the result is also 52.

回答1:

First, better on posting this question from your other. The reason you were getting multiple records is the possibility of a person clocking in and out multiple times in a same day based on their shifts. Now, how to resolve this.

In MySQL, you can do inline variable declaration and assignments using "@" variables as part of the select FROM clause. What I am starting with is a simple join from the work day to the shift table (and think I understand this now), with some @variables.

For each person, joined to the shift, I am pre-computing where the middle of the shift occurs such as same day vs next day. Also, the begin2 and end2 appear to be outliers for a possible clock-in vs clock-out. Example: Person 1 is working shift 1. Shift 1 is defined for any given day of work as

shiftcode   shiftbegin2  shiftbegin  shiftmid  shiftend  shiftend2
        1     04:00:00     08:00:00  12:00:00  17:30:00  21:30:00 

So, I am interpreting this as if I work on June 28, Shift 1,

June 28 @ 4am Earliest allowed clock-in time
June 28 @ 8am Actual beginning of shift
June 28 @ 12pm (afternoon) is the middle of the work day
June 28 @ 5:30pm is the end of the work day
June 28 @ 9:30pm is the max expected clock-out recognized for the shift

Similarly, for shift 2 which will wrap an over-night

shiftcode   shiftbegin2  shiftbegin  shiftmid  shiftend  shiftend2
        2     12:00:00     17:30:00  21:00:00  05:30:00  09:30:00

June 28 @ 12pm (afternoon) Earliest allowed clock-in time
June 28 @ 5:30pm Actual beginning of shift
June 28 @ 9pm is the middle of the shift
June 29 @ 5:30am (day roll-over) is the end of the work day 
June 29 @ 9:30am (day roll-over) is the max expected clock-out for the shift

So, if this is all correct, my inner query is pre-determining all these ranges for each person, so I will only ever have 1 record per person per work day regardless of how many scans via below.

select 
      wd.wdpercode,
      wd.wdshift,
      wd.wddate,
      s.shiftbegin,
      s.shiftend,
      s.shiftbegin2,
      s.shiftmid,
      s.shiftend2,
      @midDay := if( s.shiftbegin < s.shiftmid, wd.wddate, date_add( wd.wddate, interval 1 day )) as NewMidDay,
      @endDay := if( s.shiftbegin < s.shiftend, wd.wddate, date_add( wd.wddate, interval 1 day )) as NewEndDay,
      cast( concat(wd.wddate, ' ', s.shiftbegin2 ) as DateTime ) as EarliestClockIn,
      cast( concat(wd.wddate, ' ', s.shiftbegin ) as DateTime ) as BeginShift,
      cast( concat(@midDay, ' ', s.shiftmid ) as DateTime ) as MidShift,
      cast( concat( @endDay, ' ', s.shiftend ) as DateTime ) as EndShift,
      cast( concat( @endDay, ' ', s.shiftend2 ) as DateTime ) as MaxClockOut
   from
      ( select 
              @endDay := '', 
              @midDay := '' ) sqlvars,
      tb_workday wd
         join tb_shift s
            on wd.wdshift = s.shiftcode

The inline computing of the @midDay and @endDay are so I don't have to worry about joining to the scanned time clock table and keep adding 1 day in the midst of all else being considered. So, at the end of this query, I would end up with something like... Notice between person 1 normal shift and person 2 night shift, the computed end date shows the roll-over dates too

wdpercode  wdshift  wddate      shiftbegin  shiftend  shiftbegin2  shiftmid  shiftend2  NewMidDay   NewEndDay   EarliestClockIn   BeginShift        MidShift          EndShift          MaxClockOut
000001     1        2010-10-10  08:00       17:30     04:00        12:00     21:30      2010-10-10  2010-10-10  2010-10-10 04:00  2010-10-10 08:00  2010-10-10 12:00  2010-10-10 17:30  2010-10-10 21:30:00
000001     1        2010-10-11  08:00       17:30     04:00        12:00     21:30      2010-10-11  2010-10-11  2010-10-11 04:00  2010-10-11 08:00  2010-10-11 12:00  2010-10-11 17:30  2010-10-11 21:30:00
000001     1        2010-10-12  08:00       17:30     04:00        12:00     21:30      2010-10-12  2010-10-12  2010-10-12 04:00  2010-10-12 08:00  2010-10-12 12:00  2010-10-12 17:30  2010-10-12 21:30:00
000001     1        2010-10-13  08:00       17:30     04:00        12:00     21:30      2010-10-13  2010-10-13  2010-10-13 04:00  2010-10-13 08:00  2010-10-13 12:00  2010-10-13 17:30  2010-10-13 21:30:00

000002     2        2010-10-10  17:30       05:30     12:00        21:00     09:30      2010-10-10  2010-10-11  2010-10-10 12:00  2010-10-10 17:30  2010-10-10 21:00  2010-10-11 05:30  2010-10-11 09:30:00
000002     2        2010-10-11  17:30       05:30     12:00        21:00     09:30      2010-10-11  2010-10-12  2010-10-11 12:00  2010-10-11 17:30  2010-10-11 21:00  2010-10-12 05:30  2010-10-12 09:30:00
000002     2        2010-10-12  17:30       05:30     12:00        21:00     09:30      2010-10-12  2010-10-13  2010-10-12 12:00  2010-10-12 17:30  2010-10-12 21:00  2010-10-13 05:30  2010-10-13 09:30:00
000002     2        2010-10-13  17:30       05:30     12:00        21:00     09:30      2010-10-13  2010-10-14  2010-10-13 12:00  2010-10-13 17:30  2010-10-13 21:00  2010-10-14 05:30  2010-10-14 09:30:00

You could remove the extra columns from this query, but I included all so you could see / confirm what the values are for consideration of each row and date of work scheduled. The abbreviated list I would still need is

select 
      wd.wdpercode,
      @midDay := if( s.shiftbegin < s.shiftmid, wd.wddate, date_add( wd.wddate, interval 1 day )) as NewMidDay,
      @endDay := if( s.shiftbegin < s.shiftend, wd.wddate, date_add( wd.wddate, interval 1 day )) as NewEndDay,
      cast( concat(wd.wddate, ' ', s.shiftbegin2 ) as DateTime ) as EarliestClockIn,
      cast( concat(wd.wddate, ' ', s.shiftbegin ) as DateTime ) as BeginShift,
      cast( concat(@midDay, ' ', s.shiftmid ) as DateTime ) as MidShift,
      cast( concat( @endDay, ' ', s.shiftend ) as DateTime ) as EndShift,
      cast( concat( @endDay, ' ', s.shiftend2 ) as DateTime ) as MaxClockOut

So, if the above is accurate, we now have to get the clock in and out for each person based on the MAXIMUM range computed from this query which COULD more than one record per date

wdpercode  EarliestClockIn    MidShift          MaxClockOut
000001     2010-10-10 04:00   2010-10-10 12:00  2010-10-10 21:30:00
000002     2010-10-10 12:00   2010-10-10 21:00  2010-10-11 09:30:00

So here, I am doing a join to the scan times for any dates within the earliest clock in and max clock out and using midshift as the basis of determining if they clocked in late vs leaving early. I added the extra MIN() and MAX() for the arrival and departure for a given person / shift just to confirm what you DO AND should be seeing.

The purpose of the MAX( IF() ) is to capture the late / early status ONLY IF they have happened. Since the group by is per shift, the first record (clock in) might be late and you want that time, but the second record for clocking out is not applicable via the mid-shift time and would thus be blank. Similarly for detecting early departure from a shift.

select
      perPerson.wdPerCode,
      perPerson.BeginShift,
      perPerson.EndShift,
      min( TS.scScanTime ) as Arrival,
      max( TS.scScanTime ) as Departure,
      max( IF( TS.scScanTime > perPerson.BeginShift         
           AND TS.scScanTime <= perPerson.MidShift, TS.scScanTime, "" )) as LateArrival,
      max( IF( TS.scScanTime > perPerson.MidShift
           AND TS.scScanTime < perPerson.EndShift, TS.scScanTime, "" )) as EarlyDepart
   from
      ( select
              wd.wdpercode,
              @midDay := if( s.shiftbegin < s.shiftmid, wd.wddate, 
                 date_add( wd.wddate, interval 1 day )) as NewMidDay,
              @endDay := if( s.shiftbegin < s.shiftend, wd.wddate, 
                 date_add( wd.wddate, interval 1 day )) as NewEndDay,
              cast( concat(wd.wddate, ' ', s.shiftbegin2 ) as DateTime ) as EarliestClockIn,
              cast( concat(wd.wddate, ' ', s.shiftbegin ) as DateTime ) as BeginShift,
              cast( concat(@midDay, ' ', s.shiftmid ) as DateTime ) as MidShift,
              cast( concat( @endDay, ' ', s.shiftend ) as DateTime ) as EndShift,
              cast( concat( @endDay, ' ', s.shiftend2 ) as DateTime ) as MaxClockOut
           from
              ( select
                      @endDay := '',
                      @midDay := '' ) sqlvars,
              tb_workday wd
                 join tb_shift s
                    on wd.wdshift = s.shiftcode ) perPerson
         JOIN tb_scan TS
            on perPerson.wdpercode = TS.scpercode
            AND TS.scScanTime >= perPerson.EarliestClockIn
            AND TS.scScanTime <= perPerson.MaxClockOut
   group by
      perPerson.wdPerCode,
      perPerson.BeginShift;

I created tables and sample data from what you provided via (of which some of your data did not match the sample dates and ranges, so I adjusted to do so).

CREATE TABLE `tb_scan` (
  `scpercode` varchar(6) DEFAULT NULL,
  `scscantime` datetime,
  KEY `all` (`scyear`,`scmonth`,`scday`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into tb_scan 
( scpercode, scscantime ) 
values
( '000001', '2010-10-10 08:02:00' ),
( '000001', '2010-10-10 17:33:00' ),
( '000001', '2010-10-11 07:48:00' ),
( '000001', '2010-10-11 17:29:00' ),
( '000001', '2010-10-12 08:04:00' ),
( '000001', '2010-10-12 17:28:00' ),
( '000002', '2010-10-10 17:31:00' ),
( '000002', '2010-10-11 05:35:00' ),
( '000002', '2010-10-11 17:28:00' ),
( '000002', '2010-10-12 05:29:00' ),
( '000002', '2010-10-12 17:32:00' ),
( '000002', '2010-10-13 05:27:00' );

CREATE TABLE `tb_workday` (
  `wdpercode` varchar(6) DEFAULT NULL,
  `wdshift` varchar(1) DEFAULT NULL,
  `wddate` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into tb_workday 
( wdpercode, wdshift, wddate )
values
( '000001', '1', '2010-10-10' ),
( '000001', '1', '2010-10-11' ),
( '000001', '1', '2010-10-12' ),
( '000001', '1', '2010-10-13' ),
( '000002', '2', '2010-10-10' ),
( '000002', '2', '2010-10-11' ),
( '000002', '2', '2010-10-12' ),
( '000002', '2', '2010-10-13' );


CREATE TABLE `tb_shift` (
  `shiftcode` varchar(1) DEFAULT NULL,
  `shiftbegin2` varchar(8) DEFAULT NULL,
  `shiftbegin` varchar(8) DEFAULT NULL,
  `shiftmid` varchar(8) DEFAULT NULL,
  `shiftend` varchar(8) DEFAULT NULL,
  `shiftend2` varchar(8) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into tb_shift
( shiftcode, shiftbegin2, shiftbegin, shiftmid, shiftend, shiftend2 )
values
( '1', '04:00:00', '08:00:00', '12:00:00', '17:30:00', '21:30:00' ), 
( '2', '12:00:00', '17:30:00', '21:00:00', '05:30:00', '09:30:00' );

The sample data shows each person with an 1: arrive late, 2: depart early, 3: arrive late AND depart early.

wdPerCode  BeginShift         EndShift           Arrival            Departure          LateArrival        EarlyDepart
000001     2010-10-10 08:00   2010-10-10 17:30   2010-10-10 08:02   2010-10-10 17:33   2010-10-10 08:02
000001     2010-10-11 08:00   2010-10-11 17:30   2010-10-11 07:48   2010-10-11 17:29                      2010-10-11 17:29
000001     2010-10-12 08:00   2010-10-12 17:30   2010-10-12 08:04   2010-10-12 17:28   2010-10-12 08:04   2010-10-12 17:28

000002     2010-10-10 17:30   2010-10-11 05:30   2010-10-10 17:31   2010-10-11 05:35   2010-10-10 17:31
000002     2010-10-11 17:30   2010-10-12 05:30   2010-10-11 17:28   2010-10-12 05:29                      2010-10-12 05:29
000002     2010-10-12 17:30   2010-10-13 05:30   2010-10-12 17:32   2010-10-13 05:27   2010-10-12 17:32   2010-10-13 05:27

For optimizing the query, I would change your index on the scan table

CREATE TABLE `tb_scan` (
  `scpercode` varchar(6) DEFAULT NULL,
  `scscantime` datetime,
  KEY `personDate` (`scpercode`, `scscantime` )