It is for a system that calculates how the users scan their fingerprints when they enter/leave the workplace. I don't know how it is called in English. I need to determine if the user is late in the morning, and if the user leaves work early.
This tb_scan
table contains date and time a user scans a fingerprint.
CREATE TABLE `tb_scan` (
`scpercode` varchar(6) DEFAULT NULL,
`scyear` varchar(4) DEFAULT NULL,
`scmonth` varchar(2) DEFAULT NULL,
`scday` varchar(2) DEFAULT NULL,
`scscantime` datetime,
KEY `all` (`scyear`,`scmonth`,`scday`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
It has 100,000+ rows, something like this
scpercode scyear scmonth scday scdateandtime
000001 2010 10 10 2016-01-10 08:02:00
000001 2010 10 10 2016-01-02 17:33:00
000001 2010 10 11 2016-01-11 07:48:00
000001 2010 10 11 2016-01-11 17:29:00
000002 2010 10 10 2016-01-10 17:31:00
000002 2010 10 10 2016-01-02 17:28:00
000002 2010 10 11 2016-01-11 05:35:00
000002 2010 10 11 2016-01-11 05:29:00
And this tb_workday
table contains each date
CREATE TABLE `tb_workday` (
`wdpercode` varchar(6) DEFAULT NULL,
`wdshift` varchar(1) DEFAULT NULL,
`wddate` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
It has rows with date sequence like this:
wdpercode wdshift wddate
000001 1 2010-10-10
000001 1 2010-10-11
000001 1 2010-10-12
000001 1 2010-10-13
000002 2 2010-10-10
000002 2 2010-10-11
000002 2 2010-10-12
000002 2 2010-10-13
There is another tb_shift
table containing shift time
CREATE TABLE `tb_shift` (
`shiftcode` varchar(1) DEFAULT NULL,
`shiftbegin2` varchar(4) DEFAULT NULL,
`shiftbegin` varchar(4) DEFAULT NULL,
`shiftmid` varchar(4) DEFAULT NULL,
`shiftend` varchar(4) DEFAULT NULL,
`shiftend2` varchar(4) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
shiftcode shiftbegin2 shiftbegin shiftmid shiftend shiftend2
1 04:00:00 08:00:00 12:00:00 17:30:00 21:30:00
2 12:00:00 17:30:00 21:00:00 05:30:00 09:30:00
I want to determine that in each day, is the employee comes to work late or leaves work early, and at what time.
SELECT wdpercode,wddate,shiftbegin,shiftend,time(tlate.scscantime) wdlate,time(tearly.scscantime) wdearly
FROM tb_workday
LEFT JOIN tb_shift
ON wdshift=shiftcode
LEFT JOIN tb_scan tlate
ON wdpercode=tlate.scpercode
AND tlate.scyear=year(wddate)
AND tlate.scmonth=month(wddate)
AND (tlate.scday=day(wddate)
OR tlate.scday=day(wddate)+1)
AND tlate.scscantime>=ADDDATE(CONCAT(wddate,' ',shiftbegin),INTERVAL IF(shiftbegin2>shiftbegin,1,0) DAY)
AND tlate.scscantime<=ADDDATE(CONCAT(wddate,' ',shiftmid),INTERVAL IF(shiftbegin2>shiftmid,1,0) DAY)
LEFT JOIN tb_scan tearly
ON wdpercode=tearly.scpercode
AND tearly.scyear=year(wddate)
AND tearly.scmonth=month(wddate)
AND (tearly.scday=day(wddate)
OR tearly.scday=day(wddate)+1)
AND tearly.scscantime>ADDDATE(CONCAT(wddate,' ',shiftmid),INTERVAL IF(shiftbegin2>shiftmid,1,0) DAY)
AND tearly.scscantime<ADDDATE(CONCAT(wddate,' ',shiftend),INTERVAL IF(shiftbegin2>shiftend,1,0) DAY)
Here is the example of an output:
wdpercode wddate shiftbegin shiftend wdlate wdearly
000001 2016-01-10 08:00:00 17:30:00 08:02:00 (null)
000001 2016-01-11 08:00:00 17:30:00 (null) 17:29:00
000002 2016-01-11 17:30:00 05:30:00 17:31:00 (null)
000002 2016-01-11 17:30:00 05:30:00 (null) 05:29:00
this ADDDATE(CONCAT(wddate,' ',shiftbegin),INTERVAL IF(shiftbegin2>shiftbegin,1,0) DAY)
is for employees who work on night shift, so it has to add 1 day into the shift time
The problem is if I create an index for scscantime
, MySQL refuses to use it for comparison (>=
,<=
,>
,<
). Please see this thread Why does MySQL not use an index for a greater than comparison?
Because of this I created the scyear
, scmonth
, and scday
fields and combine them in an index along with scpercode
. And I have to make sure it calculates for workers working in night shift too so I have to add it with OR scday=day(wddate)+1
condition.
Before I added the OR condition, the EXPLAIN
result was 52 rows. But when I added the OR scday=day(wddate)+1
condition, the EXPLAIN
result became 364 rows, that means MySQL did not use scday part of the index. Is there any way to use the whole index, so the EXPLAIN
result becomes more efficient like 52 rows? I also tried removing the +1
part and the result is also 52.
First, better on posting this question from your other. The reason you were getting multiple records is the possibility of a person clocking in and out multiple times in a same day based on their shifts. Now, how to resolve this.
In MySQL, you can do inline variable declaration and assignments using "@" variables as part of the select FROM clause. What I am starting with is a simple join from the work day to the shift table (and think I understand this now), with some @variables.
For each person, joined to the shift, I am pre-computing where the middle of the shift occurs such as same day vs next day. Also, the begin2 and end2 appear to be outliers for a possible clock-in vs clock-out. Example: Person 1 is working shift 1. Shift 1 is defined for any given day of work as
So, I am interpreting this as if I work on June 28, Shift 1,
Similarly, for shift 2 which will wrap an over-night
So, if this is all correct, my inner query is pre-determining all these ranges for each person, so I will only ever have 1 record per person per work day regardless of how many scans via below.
The inline computing of the @midDay and @endDay are so I don't have to worry about joining to the scanned time clock table and keep adding 1 day in the midst of all else being considered. So, at the end of this query, I would end up with something like... Notice between person 1 normal shift and person 2 night shift, the computed end date shows the roll-over dates too
You could remove the extra columns from this query, but I included all so you could see / confirm what the values are for consideration of each row and date of work scheduled. The abbreviated list I would still need is
So, if the above is accurate, we now have to get the clock in and out for each person based on the MAXIMUM range computed from this query which COULD more than one record per date
So here, I am doing a join to the scan times for any dates within the earliest clock in and max clock out and using midshift as the basis of determining if they clocked in late vs leaving early. I added the extra MIN() and MAX() for the arrival and departure for a given person / shift just to confirm what you DO AND should be seeing.
The purpose of the MAX( IF() ) is to capture the late / early status ONLY IF they have happened. Since the group by is per shift, the first record (clock in) might be late and you want that time, but the second record for clocking out is not applicable via the mid-shift time and would thus be blank. Similarly for detecting early departure from a shift.
I created tables and sample data from what you provided via (of which some of your data did not match the sample dates and ranges, so I adjusted to do so).
The sample data shows each person with an 1: arrive late, 2: depart early, 3: arrive late AND depart early.
For optimizing the query, I would change your index on the scan table