Background: I'm using WF 4.0. My web app and workflow are running on a farm (4 machines). All references from web app to workflow use "http://localhost/..". Workflow persistance is kept on one database using SQL Server 2005. Running on Windows Server 2008.
Scenario: ServerA creates new workflow and processing completes when it reaches a Pick Activity which contains Receive Activities as its branches. Just before the Pick activity, an app-specific flag is set in the database to indicate that workflow is ready to listen to the next request. A bookmark is created in the workflow persistance database. ServerB uses correlation to resume the existing workflow and processing continues until it reaches the next Pick Activity. And on and on.
Scenario above works fine in most cases, except when ServerB attempts to resume the workflow soon after the "app-specific flag" is set. This flag is a custom action that allows me to notify the user that he can continue with the next operation. HOWEVER, in most cases this will fail with an InstanceLockedException. The server shows that there were a number of attempts and then it tries to redirect to yet another server before it throws another exception called RedirectionException.
Currently my WCF configuration is set to the following:
<sqlWorkflowInstanceStore connectionString="[conn str]"
instanceEncodingOption="None"
instanceCompletionAction="DeleteNothing"
instanceLockedExceptionAction="BasicRetry"
hostLockRenewalPeriod="00:00:30"
runnableInstancesDetectionPeriod="00:00:05" />
I'm not sure if I should use AgressiveRetry or change the renewal and detection periods... Or if I need to do something totally different. Your input is greatly appreciated.
Make sure you set the timeToUnload and timeToPersist to 00:00:00. This is the recommended setting for load balanced wf hosts.
http://msdn.microsoft.com/en-us/library/ff383824.aspx