The session for this agent already exists

2019-04-07 23:21发布

问题:

I am using TFS to execute a nightly build that includes several steps that use the TFS Test Agent. I am running the latest version of TFS/Test Agent(2015 - Update 3) and there are no other builds being run at this time. Often(maybe half the time), when the nightly job is run the step "Visual Studio Test Agent Deployment" fails with the following error:

The job has been abandoned because agent Agent-XXX did not renew the lock. Ensure agent is running, not sleeping, and has not lost communication with the service.

This is due to the error found in the Test Agent's log file(under _diag):

The session for this agent already exists. Sleeping for 30 seconds before next retry.

Microsoft.TeamFoundation.DistributedTask.WebApi.TaskAgentSessionConflictException: The task agent Agent-XXX already has an active session for owner XXX.

This issue is directly referenced here, and indirectly talked about here.

The solution I've found to this issue is to restart the server that the test agent is running on, this clears any dead sessions, and after the server starts back up, the tests run just fine. I think this is effectively what is being done in the previously mentioned post. The result of resetting the configs is that the service is restarted.

While being presented as a solution in the linked article, it is only temporary. Even after the server has been restarted and the build runs successfully, the next day the issue will again reappear necessitating manual intervention to get the build to run.

I could schedule a task to reset the service or even restart the server directly before the nightly build is run, but it strikes me as a bandage rather than a fix. Has anyone experienced this issue before, and if so is there any way to prevent it from occurring in the first place?

Update 1

I simply set up a build that runs 5 minutes before my main tests that runs a Bat script to restart all my servers hosting my test agents. This is a workaround, but one that seems to resolve the issue. Hopefully someday someone can come up with a better solution than this, but for now, it's how I have to run automated testing in TFS.

Update 2

I have three servers now, all three exhibit the same issue, though it is hard to pin down exactly when it occurs. Scaling up the workaround without creating downtime it proving to be quite challenging.

Update 3

A better day came, I upgraded TFS to 2018, and the build agent to the latest version, this issue no longer occurs, I think its a bug in the old build agent. I still don't have a solution for the original version of the build agent...