I'm currently building a ray marcher to look at things like the mandelbox, etc. It works great. However, with my current program, it uses each worker as a ray projected from the eye. This means that there is a large amount of execution per worker. So when looking at an incredibly complex object or trying to render with large enough precision it causes my display drivers to crash because the kernel was taking too long to execute on a single worker. I'm trying to avoid changing my registry values to make the timeout longer as I want this application to work on multiple computers.
Is there any way to remedy this? As it stands the executions of each work-item are completely independent of the work items nearby. I've contemplated subscribing a buffer to the GPU that would store the current progress on that ray and only execute a small amount of iterations. Then, I would just call the program over and over and the result would hopefully refine a bit more. The problem with this is that I am unsure how to deal with branching rays (eg. reflecting and refraction) unless I have a max number of each to anticipate.
Anyone have any pointers on what I should do to remedy this problem? I'm quite the greenhorn to OpenCL and have been having this issue for quite some time. I feel as though I'm doing something wrong or misusing OpenCL principally since my single workitems have a lot of logic behind them, but I don't know how to split the task as it is just a series of steps and checks and adjustments.