-->

Azure Batch Job Scheduling: Task doesn't run r

2019-08-26 14:00发布

问题:

My objective is to schedule an Azure Batch Task to run every 5 minutes from the moment it has been added, and I use the Python SDK to create/manage my Azure resources. I tried creating a Job-Schedule and it automatically created a new Job under the specified Pool.

    job_spec = batch.models.JobSpecification(
        pool_info=batch.models.PoolInformation(pool_id=pool_id)
    )
    schedule = batch.models.Schedule(
        start_window=datetime.timedelta(hours=1),
        recurrence_interval=datetime.timedelta(minutes=5)
    )
    setup = batch.models.JobScheduleAddParameter(
        'python_test_schedule',
        schedule,
        job_spec
    )
    batch_client.job_schedule.add(setup)

What I did is then add a task to this new Job. But the task seems to run only once as soon as it is added (like a normal task). Is there something more that I need to do to make the task run recurrently? There doesn't seem to be much documentation and examples of JobSchedule either.

Thank you! Any help is appreciated.

回答1:

You are correct in that a JobSchedule will create a new job at the specified time interval. Additionally, you cannot have a task "re-run" every 5 minutes once it has completed. You could do either:

  • Have one task that runs a loop, performing the same action every 5 minutes.
  • Use a Job Manager to add a new task (that does the same thing) every 5 minutes.

I would probably recommend the 2nd option, as it has a little more flexibility to monitor the progress of the tasks and job and take actions accordingly. An example client which creates the job might look a bit like this:

job_manager = models.JobManagerTask(
    id='job_manager',
    command_line="/bin/bash -c 'python ./job_manager.py'",
    environment_settings=[
        mdoels.EnvironmentSettings('AZ_BATCH_KEY', AZ_BATCH_KEY)],
    resource_files=[
        models.ResourceFile(blob_sas="https://url/to/job_manager.py", file_name="job_manager.py")],
    authentication_token_settings=models.AuthenticationTokenSettings(
        access=[models.AccessScope.job]),
    kill_job_on_completion=True,  # This will mark the job as complete once the Job Manager has finished.
    run_exclusive=False)  # Whether the job manager needs a dedicated VM - this will depend on the nature of the other tasks running on the VM.


new_job = models.JobAddParameter(
    id='my_job',
    job_manager_task=job_manager,
    pool_info=models.PoolInformation(pool_id='my_pool'))

batch_client.job.add(new_job)

Now we need a script to run as the Job Manager on the compute node. In this case I will use Python, so you will need to add a StartTask to you pool (or JobPrepTask to the job) to install the azure-batch Python package.

Additionally the Job Manager Task will need to be able to authenticate against the Batch API. There are two methods of doing this depending on the scope of activities that the Job Manager will perform. If you only need to add tasks, then you can use the authentication_token_settings attribute, which will add an AAD token environment variable to the Job Manager task with permissions to ONLY access the current job. If you need permission to do other things, like alter the pool, or start new jobs, you can pass an account key via environment variable. Both options are shown above.

The script you run on the Job Manager task could look something like this:

import os
import time

from azure.batch import BatchServiceClient
from azure.batch.batch_auth import SharedKeyCredentials
from azure.batch import models

# Batch account credentials
AZ_BATCH_ACCOUNT = os.environ['AZ_BATCH_ACCOUNT_NAME']
AZ_BATCH_KEY = os.environ['AZ_BATCH_KEY']
AZ_BATCH_ENDPOINT = os.environ['AZ_BATCH_ENDPOINT']

# If you're using the authentication_token_settings for authentication
# you can use the AAD token in the environment variable AZ_BATCH_AUTHENTICATION_TOKEN.


def main():
    # Batch Client
    creds = SharedKeyCredentials(AZ_BATCH_ACCOUNT, AZ_BATCH_KEY)
    batch_client = BatchServiceClient(creds, base_url=AZ_BATCH_ENDPOINT)

    # You can set up the conditions under which your Job Manager will continue to add tasks here.
    # It could be a timeout, max number of tasks, or you could monitor tasks to act on task status
    condition = True
    task_id = 0
    task_params = {
        "command_line": "/bin/bash -c 'echo hello world'",
        # Any other task parameters go here.
    }

    while condition:
        new_task = models.TaskAddParameter(id=task_id, **task_params)
        batch_client.task.add(AZ_JOB, new_task)
        task_id += 1
        # Perform any additional log here - for example:
        # - Check the status of the tasks, e.g. stdout, exit code etc
        # - Process any output files for the tasks
        # - Delete any completed tasks
        # - Error handling for tasks that have failed
        time.sleep(300)  # Wait for 5 minutes (300 seconds)

    # Job Manager task has completed - it will now exit and the job will be marked as complete.

if __name__ == '__main__':
    main()