Problem: I'm trying to render a dynamic Julia fractal in real time. Because the fractal is constantly changing, I need to be able to render at least 20 frames per second, preferably more. What you need to know about a Julia fractal is that every pixel can be calculated independently, so the task is easy parallelizable.
First approach: Because I'm already used to Monogame in C#, I tried writing a shader in HLSL that would do the job, but the compiler kept complaining because I used up more than the allowable 64 arithmetic slots (I need at least a thousand).
Second approach: Using the CPU, it took, as could be expected, about two minutes to generate one frame.
Third approach: I started learning the basics of OpenCL using a wrapper called Cloo. I actually got a quick, nice result by calculating the image data using OpenCL, then getting the data from the GPU, storing the data in a Texture2D and drawing the texture to the screen. For a 1000x1000 image I get about 13 frames a second. This is still not quite what I had hoped for, as the image should be 1920x1080 to fill up my screen, and the frame rate is pretty noticeable. I realised that I'm actually generating the image on the GPU, sending the data to the CPU and then sending it back to the GPU, so this seems like an unnecessary step that, if could be removed, will probably solve my problem. I read on some fora that OpenGL is able to do this, but I haven't been able to find specific information.
Questions: Firstly, is there a simple way to draw the data generated by OpenCL directly without involving CPU (preferably compatible with Monogame)? If this isn't the case, is it possible to implement it using OpenGL and afterwards combine it with Monogame? Secondly, why isn't this possible with a simple HLSL shader? As HLSL and OpenCL both use the GPU, why is HLSL so much more limited when it comes to doing many arithmetic operations?
Edit
I found this site that does roughly what I want, but using a GLSL shader. This again questions my fait in HLSL. Unfortunately, as monogame doesn't support GLSL (yet), my questions remain unanswered.
Sorry I do not use OpenCL nor C# but You can do this fully inside shaders using GLSL (but you might have precision problems as for Julia like fractals is sometimes even 64bit double
not enough). Anyway here a simple example of Mandelbrot set I did some years back...
CPU side app C++/OpenGL/GLSL/VCL code::
//---------------------------------------------------------------------------
#include <vcl.h>
#pragma hdrstop
#include "Unit1.h" // VCL window header
#include "gl\\OpenGL3D_double.cpp" // my GL engine
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
OpenGLscreen scr;
GLSLprogram shd;
float mx=0.0,my=0.0,mx0=0.0,my0=0.0,mx1=0.0,my1=0.0;
TShiftState sh0,sh1;
int xs=1,ys=1;
int txrmap=-1;
float zoom=1.000;
unsigned int queryID[2];
//---------------------------------------------------------------------------
void gl_draw()
{
float x,y,dx,dy;
scr.cls();
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// matrix for old GL rendering
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glMatrixMode(GL_TEXTURE);
glLoadIdentity();
// GLSL uniforms
shd.bind();
shd.set1i("txrmap",0); // texture unit
shd.set2f("p0",mx,my); // pan position
shd.set1f("zoom",zoom); // zoom
// issue the first query
// Records the time only after all previous
// commands have been completed
glQueryCounter(queryID[0], GL_TIMESTAMP);
// QUAD covering screen
scr.txrs.bind(txrmap);
glColor3f(1.0,1.0,1.0);
glBegin(GL_QUADS);
glTexCoord2f(0.0,0.0); glVertex2f(-1.0,+1.0);
glTexCoord2f(0.0,1.0); glVertex2f(-1.0,-1.0);
glTexCoord2f(1.0,1.0); glVertex2f(+1.0,-1.0);
glTexCoord2f(1.0,0.0); glVertex2f(+1.0,+1.0);
glEnd();
shd.unbind();
scr.txrs.unbind();
// issue the second query
// records the time when the sequence of OpenGL
// commands has been fully executed
glQueryCounter(queryID[1], GL_TIMESTAMP);
// GL driver info and GLSL log
scr.text_init_pix(1.0);
glColor4f(0.0,0.2,1.0,0.8);
scr.text(glGetAnsiString(GL_VENDOR));
scr.text(glGetAnsiString(GL_RENDERER));
scr.text("OpenGL ver: "+glGetAnsiString(GL_VERSION));
glColor4f(0.4,0.7,0.8,0.8);
for (int i=1;i<=shd.log.Length();) scr.text(str_load_lin(shd.log,i,true));
scr.text_exit();
scr.exe();
scr.rfs();
// wait until the results are available
int e;
unsigned __int64 t0,t1;
for (e=0;!e;) glGetQueryObjectiv(queryID[0],GL_QUERY_RESULT_AVAILABLE,&e);
for (e=0;!e;) glGetQueryObjectiv(queryID[1],GL_QUERY_RESULT_AVAILABLE,&e);
glGetQueryObjectui64v(queryID[0], GL_QUERY_RESULT, &t0);
glGetQueryObjectui64v(queryID[1], GL_QUERY_RESULT, &t1);
Form1->Caption=AnsiString().sprintf("Time spent on the GPU: %f ms\n", (t1-t0)/1000000.0);
}
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
{
scr.init(this);
OpenGLtexture txr;
txr.load ("gradient.jpg");
txrmap=scr.txrs.add(txr);
shd.set_source_file("","","","Mandelbrot_set.glsl_vert","Mandelbrot_set.glsl_frag");
glGenQueries(2, queryID);
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormDestroy(TObject *Sender)
{
scr.exit();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormResize(TObject *Sender)
{
scr.resize();
xs=ClientWidth;
ys=ClientHeight;
gl_draw();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormPaint(TObject *Sender)
{
gl_draw();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormMouseMove(TObject *Sender, TShiftState Shift, int X,int Y)
{
bool q0,q1;
mx1=1.0-divide(X+X,xs-1);
my1=divide(Y+Y,ys-1)-1.0;
sh1=Shift;
q0=sh0.Contains(ssLeft);
q1=sh1.Contains(ssLeft);
if (q1)
{
mx-=(mx1-mx0)*zoom;
my-=(my1-my0)*zoom;
}
mx0=mx1; my0=my1; sh0=sh1;
gl_draw();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormMouseDown(TObject *Sender, TMouseButton Button,TShiftState Shift, int X, int Y)
{
FormMouseMove(Sender,Shift,X,Y);
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormMouseUp(TObject *Sender, TMouseButton Button,TShiftState Shift, int X, int Y)
{
FormMouseMove(Sender,Shift,X,Y);
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormMouseWheelDown(TObject *Sender, TShiftState Shift, TPoint &MousePos, bool &Handled)
{
zoom*=1.2;
gl_draw();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormMouseWheelUp(TObject *Sender, TShiftState Shift, TPoint &MousePos, bool &Handled)
{
zoom/=1.2;
gl_draw();
}
//---------------------------------------------------------------------------
You can ignore most of the code the important stuff is gl_draw()
rendering single QUAD
covering whole screen and passing zoom
and pan
position. This code uses old style glBegin/glEnd
and default nVidia locations so it may not work on different vendor gfx drivers. The mesh should be in VAO/VBO so the layout locations will match to see how to do it take a look at the link on the end of answer or port the shaders to compatibility profile.
Vertex:
// Vertex
#version 420 core
layout(location=0) in vec2 pos; // glVertex2f <-1,+1>
out smooth vec2 p; // texture end point <0,1>
void main()
{
p=pos;
gl_Position=vec4(pos,0.0,1.0);
}
Fragment:
// Fragment
#version 420 core
uniform sampler2D txrmap; // texture unit for light map
uniform vec2 p0=vec2(0.0,0.0); // mouse position <-1,+1>
uniform float zoom=1.000; // zoom [-]
in smooth vec2 p;
out vec4 col;
void main()
{
int i,n;
vec2 pp;
float x,y,q,xx,yy;
pp=(p*zoom)-p0; // y (-1, 1)
pp.x=(1.75*pp.x)-0.75; // x (-2.5, 1)
for (x=0.0,y=0.0,xx=0.0,yy=0.0,i=0,n=200;(i<n)&&(xx+yy<4.0);i++)
{
q=xx-yy+pp.x;
y=(2.0*x*y)+pp.y;
x=q;
xx=x*x;
yy=y*y;
}
q=float(i)/float(n);
col=texture2D(txrmap,vec2(q,0.5));
// col=vec4(q,q,q,1.0);
}
using this texture as gradient:
Here result screenshot:
In case you need to get started with GLSL (to replace my gl engine stuff) see:
- simple complete GL+VAO/VBO+GLSL+shaders example in C++
but I am sure there must be tons of tutorials for this in C# so google
To cover the questions: Yes, OpenCL can paint, but Monogame apparently doesn't encapsulate over the top of CL, so No to Question 1. Question 2 is the right question: maybe, see suggestions below. Question 3: HLSL is essentially PS 1.1 so "why isn't it possible" is because PS evolved to 2.x to manage parallelization through wider data pipes...so you want Dx12 support or GLSL/OpenGL.
Since you are close to your performance expectations using CLoo, why not try OpenCL.Net and/or OpenTK to bind the Julia calculations more closely to the Monogame API? --If you have to go GPU-CPU-GPU at least make that as wide a pipeline as possible.
Alternately, a slightly sideways solution to your parallelization and framerate problem might be integrating GP-GPU wrappers such as Quanta's Alea with your Monogame solution. I'd suggest looking at Cudafy, but Alea is more robust and cross-vendor GPU supported.
The build process will decide which portion of the Julia code will calculate on GPU via Alea, and the Monogame portions will receive the pixel-field for rendering. The sticking points will be library "play-nice" compatibility, and ultimately, frame-rate if you get it working.
Bottom line: you're stuck, by choice, in HLSL (read: Microsoft Dx9) and Monogame doesn't support GLSL/Dx12....so you will have to maneuver creatively to get un-stuck.