I am trying to run the CUDA particle sample on a remote Ubuntu machine from a host ubuntu machine. I followed this tutorial: http://devblogs.nvidia.com/parallelforall/remote-application-development-nvidia-nsight-eclipse-edition/ and it runs on my host, but not on my remote machine.
I get the following output in Nsight:
CUDA Particles Simulation Starting...
grid: 64 x 64 x 64 = 262144 cells
particles: 16384
No protocol specified
freeglut (/users/path/particles/Debug/particles): failed to open display ':0'
logout
If I run the program from the terminal I get:
CUDA Particles Simulation Starting...
grid: 64 x 64 x 64 = 262144 cells
particles: 16384
CUDA error at ../src/particleSystem_cuda.cu:85 code=79(cudaErrorInvalidGraphicsContext) "cudaGraphicsGLRegisterBuffer(cuda_vbo_resource, vbo, cudaGraphicsMapFlagsNone)"
Is it possible to display the particle simulation on my host machine, while the calculation is made on a remote system?
Is it achieved through X11Forwarding, or is it a completely different error?
I'm going to provide a lengthy answer because I just worked through this. However, before proceeding down this path, I'd encourage you to try a solution like NoMachine NX which should already have some of this capability built-in. It may meet your needs.
You can get access to a remote workstation, even a headless workstation, for graphical desktop access, and also have access to CUDA and OpenGL acceleration, using a combination of VirtualGL and TurboVNC. The following instructions are fairly specific to a particular machine configuration (e.g. linux OS, etc.) so if you make changes, they will likely break, and you will have to figure out what is different. For CentOS 6.x, this should be a pretty good recipe, for other approaches, it's just a guide. This recipe mostly assumes just a single Tesla/CUDA GPU being added. If you have multiple GPUs to configure, it should be possible, but only one would be used for the OpenGL acceleration - only one should need to be configured in
xorg.conf
.This setup can be done with a Tesla card that has no display output (e.g. K40c, for example. K20 variants are a special case, however -- see note below * ). In that case, it assumes you have another display card for the initial remote workstation setup steps. This other display card can be any card, and the machine can be converted to "headless" use after setup. If you are using a different display GPU for setup, you can initially leave your Tesla or CUDA GPU out of the system.
Install your Linux OS. I used CentOS 6.2 for this test. If you use a CentOS/RHEL 6.x OS, things should probably work for you pretty much as described here. If you use a different OS, anything may be different. If so, these instructions are just a guide, not a recipe. During install of CentOS 6.2, select "Software Development Workstation" option, to get most of the graphical and development bits we'll need. You should be prompted during setup to create an ordinary username (apart from root). We'll call this
myuser
.Disable Nouveau. On CentOS 6.x, these steps will do the trick as root:
Power down, and install the Tesla or other CUDA GPU(s) that you want to use for CUDA and/or OpenGL acceleration. Power up the machine again. Hopefully the linux display should still be appearing on the same display as you were using in step 3. There shouldn't be any problem if you are using a Tesla (i.e. non-display) card, but if you are using some other CUDA display-capable GPU (GeForce/Quadro), it's possible that the X display moved at this point, to the GPU(s) you just installed.
install CUDA 7. I used the runfile installer method, and selected yes for all questions (including install of OpenGL libraries) and accept all default paths. If you use some other CUDA version or some other install method, your results may vary.
Install VirtualGL and TurboVNC:
I don't think there's anything special about these versions, but if you use different versions, your results may vary.
run
nvidia-xconfig
(from a terminal session) as root to establish an initial/etc/X11/xorg.conf
file, then modify the "Device" section to add a line like this:where the PCI address matches your GPU (confirm with
lspci
ornvidia-smi -a
) For headless operation, you may optionally want to add a line like this to the "Screen" section, but I believe it is not necessary (even for headless operation):a complete sample
xorg.conf
appears at the end of this answer.as
myuser
, in~/.vnc/xstartup.turbovnc
, after this line:add the following line:
as
myuser
, add a startup application as follows, using the gnome desktop utility (System...Preferences...Startup Applications)the
:5
here is somewhat arbitrary, you can use other numbers like:2
if you wish, but do not use:0
. The remainder assumes you have selected:5
here. You will want to run this line also (just once, again asmyuser
) from a terminal session, in order to setup/configure the vncserver for use. The first time you run it, it will probably prompt you for a password. Remember this password - you will need it for client access later.For headless/unattended use, there are two possible options.
you can create an autologin for
myuser
. As root, edit/etc/gdm/custom.conf
to create/add:I acknowledge that some may consider an auto-login as a security risk. If that is the case, you should use the alternate ("preferred") method below:
vglserver_config
n
to all 3 questions (this will work, however there are security issues here as well. You may wish to consult the documentation, and expore managing the necessary user groups and experiment with other choices besidesn
to all 3 questions.)without an autologin, you will need a method to start the vncserver for remote access, since the startup application we added in step 9 won't take effect. One approach would be to simply login via an SSH connection to the machine after startup, and as
myuser
start the vncserver:Or you can explore various methods for running applications automatically at machine start-up. There are so many possible approaches here, it's best if you just do a google search pertinent to your OS.
If you have not already done so, you may want to build the CUDA samples. The method is covered in the getting started guide document linked in step 5 above. You may need to make sure you have a suitable glut provider installed, such as freeglut, for some of the CUDA graphical samples, such as
simpleGL
.Before powering down, you will probably also want to modify the remote workstation firewall. For my purposes, I just disabled it (System...Administration...Firewall...Disable). TurboVNC uses specific ports that will be blocked by the firewall by default. If you wish to use the firewall but open up these ports, it should be possible but is outside the scope of this recipe.
Your remote workstation is now configured. If you made all the changes above, you can switch to "headless" operation, and if you added the xorg no display option, you may in fact switch to "headless" on the next reboot. Before your reboot, you may want to make a note of the remote workstation IP address. If you are going "headless", it may be convenient to configure it with a static IP. Let's assume that you observed that the IP address of the remote workstation was 192.168.1.104. So now is the time to reboot your remote workstation.
On a client machine, you should install the TurboVNC client appropriate for your OS. Run the TurboVNC client "Viewer" app, and provide the IP appended by :5 as the machine to connect to:
Once you connect, you will be prompted for the password you provided in step 9 above. At this point, a graphical desktop associated with
myuser
should open up on your client machine. This graphical desktop does not yet have full 3D graphical acceleration associated with it. In order to make use of the GPU for OpenGL (and CUDA/OpenGL interop), it's necessary to run such applications prepended withvglrun
, like so:You are specifying
:0
here because that is the actual X display that the GPU graphical acceleration is associated with (for the logged-inmyuser
). If you built the CUDA samples, you can try a CUDA/OpenGL interop app:In both of the above examples, if you have configured the vglserver using vglserver_config in step 10, then you should be able to omit the
-d :0
switch from the vglrun command:For reference purposes (only - you most likely cannot use this xorg file verbatim), here is a complete
xorg.conf
(without any modifications fromvglserver_config
):Troubleshooting:
I don't intend to respond to detailed troubleshooting questions, since I won't be able to test every configuration. However, if your client cannot connect at all to the remote workstation, it likely means that the vncserver is not properly started, or the firewall is blocking things. For other types of troubleshooting,
/var/log/Xorg.0.log
may give some clues. Usenvidia-smi
to establish that your CUDA drivers are properly installed. And in general, headless operation is difficult to troubleshoot, so if you can arrange for a display-capable CUDA GPU for initial setup and testing, it may be easier. You can switch to a non-display GPU later.Note: * K20m and K20Xm require proper setting of the graphics operation mode using the
nvidia-smi
utility. K20c cannot be used for this purpose, it is compute-only. AFAIK, most other NVIDIA CUDA-capable GPUs should be usable for this purpose. GPUs with compute capability prior to cc2.0 cannot be used with the CUDA 7 drivers described in this writeup, however.As an additional reference, this nvidia whitepaper will be useful.