Folks, I've been having some trouble with a WF4 problem. I'm modeling a batch engine/work scheduler after Ron Jacobs' demo in his endpoint.tv webcast (http://archive.msdn.microsoft.com/wf4BatchJob). In his example, the "work branch" just counts inside a while loop. For my sample, I send a message to another workflow called SampleEngine.xamlx which does the counting. Every count, this workflow calls back to the parent (JobScheduler.xamlx) and reports progress completed. This works perfectly and I have it working now - I can schedule a job and watch it progress both in the AppFabric dashboard and by calling QueryProgress (a send/receive pair in my scheduler workflow).
The problem is that when I do an iireset to simulate a server failure or other problem, the workflows (sampleengine.xamlx and jobscheduler.xamlx) both come back online and show as "In Progress". The problem is neither of them track any more events - they seem to be stalled somewhere. Furthermore, neither of them respond to service messages in the same way that I normally see WF services respond when there's no receive activity scheduled for the workflow's current state.
I've added persist activities everywhere and made sure that I persist after the end of all of my sendreply activities but that hasn't made a difference.
As I said, the only changes I've made to Ron's sample is that my work branch is not inside a while loop - it sends a single message to the child workflow (sampleengine.xamlx) to start it. Correlation between the workflows all works properly if the server doesn't go down during execution.
Any thoughts as to what I need to do here in order to have the workflows pick up where they left off after an IISreset would be greatly appreciated. Ron's sample kept on counting where mine simply stops.