Using Azure Batch, my project adds jobs to a pool using an event based design with functions and queues. When the job is finished, it is still "active", even though all tasks are completed.
A (single using an app service plan) function is triggered on a timer which reads an X amount of messages from the queue. The function:
- Creates a pool (if it does not exist)
- Creates a job
- Adds the tasks to that job
That works good. However, once the tasks have finished, the job status remains active, even though all tasks have finished. I want the jobs to terminate/cleanup/set the status to "completed".
And I want my functions to be short-lived and do not want any statefullness. So I am not using foreach (CloudTask task in job.CompletedTasks())
to await the status of the tasks.
Another approach is to use task dependencies, which require batchClient.Utilities.CreateTaskStateMonitor()
and thus a statefull approach.
What is the best way to use Azure Batch in an event based design? And specifically, how to terminate/cleanup the jobs once the tasks are finished?
You can have the job "auto complete" once all tasks complete under the job. There is a property called OnAllTasksComplete on the CloudJob object.
You will want to initially set this property to NoAction
(the default), while you are adding tasks to the job. After you have added all the tasks to the job, you can update that value to TerminateJob
and then call Commit()/CommitAsync()
. Note that if you retain the CloudJob that you initially submitted, you will need to Refresh()/RefreshAsync()
first before modifying the properties and committing. Alternativley you can GetJob()/GetJobAsync()
, modify, then commit.
For event-based designs, you can take a look at enabling Batch service analytics and see if that is appropriate for your scenario.
Final solution with code after fpark's answer:
public class Orchestrator()
{
public Task ExecuteAsync()
{
// Create the Batch pool, which contains the compute nodes
// that execute the tasks.
var pool = await _batchManager.CreatePoolIfNotExistsAsync();
// Create the job that runs the tasks.
var job = await _batchManager.CreateJobIfNotExistsAsync(_domain, pool.Id);
// Obtain the bound job from the Batch service
await job.RefreshAsync();
// Create a collection of tasks and add them to the Batch job.
var tasks = await _fileProcessingTasksFactory.CreateAsync(job.Id);
// Add the tasks to the job; the tasks are automatically scheduled
// for execution on the nodes by the Batch service.
await job.AddTaskAsync(tasks);
job.OnAllTasksComplete = OnAllTasksComplete.TerminateJob;
await job.CommitAsync();
}
}
public class BatchManager()
public async Task<CloudPool> CreatePoolIfNotExistsAsync()
{
// Code to create and return a pool.
}
public async Task<CloudJob> CreateJobIfNotExistsAsync(string domain, string poolId)
{
// Job id cannot contain : so replace them.
var jobId = $"{domain}-{DateTime.UtcNow:s}".Replace(":", "-");
var job = _parameters.BatchClient.JobOperations.CreateJob();
job.Id = jobId;
job.PoolInformation = new PoolInformation { PoolId = poolId };
await job.CommitAsync();
return job;
}
}
If you try to create a job with OnAllTasksComplete.TerminateJob
directly, you will receive the following error:
Microsoft.Azure.Batch: This object is in an invalid state. Write access is not allowed.
2018-03-27 07:57:40.738 +02:00 [Error] "636577269909538505" - Failure while scheduling Azure Batch tasks.
System.InvalidOperationException: This object is in an invalid state. Write access is not allowed.
at Microsoft.Azure.Batch.PropertyAccessor`1.ThrowIfReadOnly(Boolean overrideReadOnly)
at Microsoft.Azure.Batch.PropertyAccessor`1.<>c__DisplayClass19_0.<SetValue>b__0()
at Microsoft.Azure.Batch.PropertyAccessController.WriteProperty(Action propertyWriteAction, BindingAccess allowedAccess, String propertyName)
at Microsoft.Azure.Batch.PropertyAccessor`1.SetValue(T value, Boolean overrideReadOnly, Boolean overrideAccessControl)
at Microsoft.Azure.Batch.CloudJob.set_OnAllTasksComplete(Nullable`1 value)
at BatchManager.CreateJobIfNotExist(String domain, String poolId) in C:\ProjectsGitHub\ProjectName\BatchManager.cs:line 107
at FileProcessingOrchestrator.<ExecuteAsync>d__6.MoveNext() in C:\ProjectsGitHub\ProjectName\FileProcessingOrchestrator.cs:line 48
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Nnip.Qrs.EdgarDataProcessing.Parallelization.FunctionApp.ScheduleAzureBatchTasks.<Run>d__0.MoveNext() in C:\ProjectsGitHub\ProjectName\FunctionApp\ScheduleAzureBatchTasks.cs:line 93
Microsoft.Azure.Batch: This object is in an invalid state. Write access is not allowed.
A ScriptHost error has occurred
Exception while executing function: ScheduleAzureBatchTasks. Microsoft.Azure.Batch: This object is in an invalid state. Write access is not allowed.
Exception while executing function: ScheduleAzureBatchTasks
Exception while executing function: ScheduleAzureBatchTasks. Microsoft.Azure.Batch: This object is in an invalid state. Write access is not allowed.
Function completed (Failure, Id=6173b9d2-5058-4a6d-9406-1cf00340774e, Duration=71076ms)
Executed 'ScheduleAzureBatchTasks' (Failed, Id=6173b9d2-5058-4a6d-9406-1cf00340774e)
System.Private.CoreLib: Exception while executing function: ScheduleAzureBatchTasks. Microsoft.Azure.Batch: This object is in an invalid state. Write access is not allowed.
Function had errors. See Azure WebJobs SDK dashboard for details. Instance ID is '6173b9d2-5058-4a6d-9406-1cf00340774e'
System.Private.CoreLib: Exception while executing function: ScheduleAzureBatchTasks. Microsoft.Azure.Batch: This object is in an invalid state. Write access is not allowed.
So set the job.OnAllTasksComplete
when all tasks have been added.
It takes around two minutes (in my case) for the job to set it's status to Completed
after all the tasks are completed.