The Scenario
There are certain SQL Agent Jobs that are scheduled to run every few minutes throughout the day.
There are legitimate times when it will miss its next schedule because it's still running from the previous schedule.
Every once and a while, a job might 'hang'. This doesn't produce a failure (since the job hasn't stopped yet). When this happens, the job can be manually stopped and works fine the next time it runs. It's designed to pick back up where it left off.
What's the most efficient way...?
I'd like a way to determine how long (in seconds) a SQL Agent Job named 'JobX' is currently running. If it isn't currently running, we can just return zero.
This way, I can stop the job if it has been running for an amount of time beyond a certain threshold.
I assume that a combination of xp_sqlagent_enum_jobs and sysjobhistory could be used, but I'm curious if there are better solutions out there... and can hopefully benefit from the obstacles the rest of you have already run into and worked around.
This solution would work:
SELECT DATEDIFF(SECOND,aj.start_execution_date,GetDate()) AS Seconds
FROM msdb..sysjobactivity aj
JOIN msdb..sysjobs sj on sj.job_id = aj.job_id
WHERE aj.stop_execution_date IS NULL -- job hasn't stopped running
AND aj.start_execution_date IS NOT NULL -- job is currently running
AND sj.name = 'JobX'
and not exists( -- make sure this is the most recent run
select 1
from msdb..sysjobactivity new
where new.job_id = aj.job_id
and new.start_execution_date > aj.start_execution_date
)
This a more general check dependent on system tables. If you'd prefer a custom route, you could have the job insert into a job log table you created instead.
/**** FOR CURRENTLY RUNNING JOBS ****/
SELECT j.name AS Job_Name,DATEDIFF(ss,a.start_execution_date ,GETDATE ())
FROM msdb.dbo.sysjobactivity a INNER JOIN msdb.dbo.sysjobs j
ON a.job_id =j.job_id
WHERE CONVERT(DATE,a.start_execution_date )=CONVERT(DATE,GETDATE ())
AND a.stop_execution_date IS NULL
/*This code will give u the Name of currently running jobs and for how much time it is running & after that u can add filters to it as u wish*/
/*Thanks in advance*/
What you're doing manually sounds like what's known as a "watch dog loop". Basically, an SQL job that starts and/or monitors agent jobs, killing them if need be.
The code below was taken from here, and should help, removing the need for you to manually monitor and kill jobs if they've been running for a lengthy period of time:
/****************************************************************
--This SQL will take a list of SQL Agent jobs (names must match),
--start them so they're all running together, and then
--monitor them, not quitting until all jobs have completed.
--
--In essence, it's an SQL "watchdog" loop to start and monitor SQL Agent Jobs
--
--Code from http://cc.davelozinski.com/code/sql-watchdog-loop-start-monitor-sql-agent-jobs
--
****************************************************************/
SET NOCOUNT ON
-------- BEGIN ITEMS THAT NEED TO BE CONFIGURED --------
--The amount of time to wait before checking again
--to see if the jobs are still running.
--Should be in hh:mm:ss format.
DECLARE @WaitDelay VARCHAR(8) = '00:00:20'
--Job timeout. Eg, if the jobs are running longer than this, kill them.
DECLARE @TimeoutMinutes INT = 240
DECLARE @JobsToRunTable TABLE
(
JobName NVARCHAR(128) NOT NULL,
JobID UNIQUEIDENTIFIER NULL,
Running INT NULL
)
--Insert the names of the SQL jobs here. Last two values should always be NULL at this point.
--Names need to match exactly, so best to copy/paste from the SQL Server Agent job name.
INSERT INTO @JobsToRunTable (JobName, JobID, Running) VALUES ('NameOfFirstSQLAgentJobToRun',NULL,NULL)
INSERT INTO @JobsToRunTable (JobName, JobID, Running) VALUES ('NameOfSecondSQLAgentJobToRun',NULL,NULL)
INSERT INTO @JobsToRunTable (JobName, JobID, Running) VALUES ('NameOfXSQLAgentJobToRun',NULL,NULL)
-------- NOTHING FROM HERE DOWN SHOULD NEED TO BE CONFIGURED --------
DECLARE @ExecutionStatusTable TABLE
(
JobID UNIQUEIDENTIFIER PRIMARY KEY, -- Job ID which will be a guid
LastRunDate INT, LastRunTime INT, -- Last run date and time
NextRunDate INT, NextRunTime INT, -- Next run date and time
NextRunScheduleID INT, -- an internal schedule id
RequestedToRun INT, RequestSource INT, RequestSourceID VARCHAR(128),
Running INT, -- 0 or 1, 1 means the job is executing
CurrentStep INT, -- which step is running
CurrentRetryAttempt INT, -- retry attempt
JobState INT -- 0 = Not idle or suspended, 1 = Executing, 2 = Waiting For Thread,
-- 3 = Between Retries, 4 = Idle, 5 = Suspended,
-- 6 = WaitingForStepToFinish, 7 = PerformingCompletionActions
)
DECLARE @JobNameToRun NVARCHAR(128) = NULL
DECLARE @IsJobRunning BIT = 1
DECLARE @AreJobsRunning BIT = 1
DECLARE @job_owner sysname = SUSER_SNAME()
DECLARE @JobID UNIQUEIDENTIFIER = null
DECLARE @StartDateTime DATETIME = GETDATE()
DECLARE @CurrentDateTime DATETIME = null
DECLARE @ExecutionStatus INT = 0
DECLARE @MaxTimeExceeded BIT = 0
--Loop through and start every job
DECLARE dbCursor CURSOR FOR SELECT JobName FROM @JobsToRunTable
OPEN dbCursor FETCH NEXT FROM dbCursor INTO @JobNameToRun
WHILE @@FETCH_STATUS = 0
BEGIN
EXEC [msdb].[dbo].sp_start_job @JobNameToRun
FETCH NEXT FROM dbCursor INTO @JobNameToRun
END
CLOSE dbCursor
DEALLOCATE dbCursor
print '*****************************************************************'
print 'Jobs started. ' + CAST(@StartDateTime as varchar)
print '*****************************************************************'
--Debug (if needed)
--SELECT * FROM @JobsToRunTable
WHILE 1=1 AND @AreJobsRunning = 1
BEGIN
--This has to be first with the delay to make sure the jobs
--have time to actually start up and are recognized as 'running'
WAITFOR DELAY @WaitDelay
--Reset for each loop iteration
SET @AreJobsRunning = 0
--Get the currently executing jobs by our user name
INSERT INTO @ExecutionStatusTable
EXECUTE [master].[dbo].xp_sqlagent_enum_jobs 1, @job_owner
--Debug (if needed)
--SELECT 'ExecutionStatusTable', * FROM @ExecutionStatusTable
--select every job to see if it's running
DECLARE dbCursor CURSOR FOR
SELECT x.[Running], x.[JobID], sj.name
FROM @ExecutionStatusTable x
INNER JOIN [msdb].[dbo].sysjobs sj ON sj.job_id = x.JobID
INNER JOIN @JobsToRunTable jtr on sj.name = jtr.JobName
OPEN dbCursor FETCH NEXT FROM dbCursor INTO @IsJobRunning, @JobID, @JobNameToRun
--Debug (if needed)
--SELECT x.[Running], x.[JobID], sj.name
-- FROM @ExecutionStatusTable x
-- INNER JOIN msdb.dbo.sysjobs sj ON sj.job_id = x.JobID
-- INNER JOIN @JobsToRunTable jtr on sj.name = jtr.JobName
WHILE @@FETCH_STATUS = 0
BEGIN
--bitwise operation to see if the loop should continue
SET @AreJobsRunning = @AreJobsRunning | @IsJobRunning
UPDATE @JobsToRunTable
SET Running = @IsJobRunning, JobID = @JobID
WHERE JobName = @JobNameToRun
--Debug (if needed)
--SELECT 'JobsToRun', * FROM @JobsToRunTable
SET @CurrentDateTime=GETDATE()
IF @IsJobRunning = 1
BEGIN -- Job is running or finishing (not idle)
IF DATEDIFF(mi, @StartDateTime, @CurrentDateTime) > @TimeoutMinutes
BEGIN
print '*****************************************************************'
print @JobNameToRun + ' exceeded timeout limit of ' + @TimeoutMinutes + ' minutes. Stopping.'
--Stop the job
EXEC [msdb].[dbo].sp_stop_job @job_name = @JobNameToRun
END
ELSE
BEGIN
print @JobNameToRun + ' running for ' + CONVERT(VARCHAR(25),DATEDIFF(mi, @StartDateTime, @CurrentDateTime)) + ' minute(s).'
END
END
IF @IsJobRunning = 0
BEGIN
--Job isn't running
print '*****************************************************************'
print @JobNameToRun + ' completed or did not run. ' + CAST(@CurrentDateTime as VARCHAR)
END
FETCH NEXT FROM dbCursor INTO @IsJobRunning, @JobID, @JobNameToRun
END -- WHILE @@FETCH_STATUS = 0
CLOSE dbCursor
DEALLOCATE dbCursor
--Clear out the table for the next loop iteration
DELETE FROM @ExecutionStatusTable
print '*****************************************************************'
END -- WHILE 1=1 AND @AreJobsRunning = 1
SET @CurrentDateTime = GETDATE()
print 'Finished at ' + CAST(@CurrentDateTime as varchar)
print CONVERT(VARCHAR(25),DATEDIFF(mi, @StartDateTime, @CurrentDateTime)) + ' minutes total run time.'