I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions right after each other and it is difficult to write a cuda code that results my target PTX code, So I need to modify ptx code directly. The problem is that I can compile it to (fatbin and cubin) but I dont know how to compile those (.fatbin and .cubin) to "X.o" file.
相关问题
- Achieving the equivalent of a variable-length (loc
- The behavior of __CUDA_ARCH__ macro
- Setting Nsight to run with existing Makefile proje
- Usage of anonymous functions in arrayfun with GPU
- Does CUDA allow multiple applications on same gpu
相关文章
- How to downgrade to cuda 10.0 in arch linux?
- What's the relation between nvidia driver, cud
- How can I use 100% of VRAM on a secondary GPU from
- NVidia CUDA toolkit 7.5.27 failing to install on O
- How can I find row to all rows distance matrix bet
- thrust: fill isolate space
- How to get the real and imaginary parts of a compl
- Matrix Transpose (with shared Memory) with arbitar
This sequence of nvcc commands seems to do the trick. Please see here for more details.
Create your ptx files to modify
Link ptx files into an object file
I did this on Windows so it popped out
a_dlink.obj
. As the documentation points out host code has been discarded by this point. Runto create object files. They will be
.obj
for Windows or.o
for Linux. Then create a library output fileThen run
which will pop out an exectuable
a.exe
on Windows ora.out
on Linux. This procedure works forcubin
andfatbin
files too. Just substitute those names in place ofptx
.Usually, when handling with cubin or ptx-files one uses the CUDA Driver API and not the Runtime API; doing so, you load the ptx or cubin file manually at runtime with
cuModuleLoadDataEx
. If you want to stick with the Runtime API you need to mimic manually what NVCC does, but this is not (entirely) documented. I only found this Nvidia forum entry on how to do this.You can load cubin or fatbin at runtime using cuModuleLoad* functions in CUDA: Here's the API
You can use it to include PTX into your build, though the method is somewhat convoluted. For instance, suricata compiles its .cu files into PTX files for different architectures and then converts them into an .h file that contains PTX code as a 'C' array, and then just includes it from one of the files during the build.
I am rather late but GPU Lynx does exactly that: take a CUDA fat binary, parse the PTX, and modify it before emitting the result to the driver for execution on a GPU. You can optionally print out the modified PTX as well.
There may be a way to do this with an orderly sequence of
nvcc
commands, but I'm not aware of it and haven't discovered it.One possible approach however, albeit messy, is to interrupt and restart the cuda compilation sequence, and edit the ptx file in the interim (before the restart). This is based on information provided in the nvcc manual, and I would not consider this a standard methodology, so your mileage may vary. There may be any number of scenarios that I haven't considered where this doesn't work or isn't feasible.
In order to explain this I shall present an example code:
For this purpose, I am dispensing with cuda error checking and other niceties, in favor of brevity.
Ordinarily we might compile the above code as follows:
(assuming the source file is named t266.cu)
Instead, based on the reference manual, we'll compile as follows:
This will build the executable, but will keep all intermediate files, including
t266.ptx
(which contains the ptx code formykernel
)If we simply ran the executable at this point, we'd get output like this:
The next step will be to edit the ptx file to make whatever changes we want. In this case, we'll have the kernel add 2 to the
data
variable instead of adding 1. The relevant line is:Now comes the messy part. The next step is to capture all the intermediate compile commands, so we can rerun some of them:
(Using linux redirection of
stderr
here). We then want to edit thatdryrun.out
file so that:-o "t266.ptx"
#$
that each line begins with, so in effect we are creating a script.When I perform the above 2 steps, I end up with a script like this:
Finally, execute the above script. (in linux you can make this script file executable using
chmod +x dryrun.out
or similar.) If you haven't made any mistakes while editing the.ptx
file, the commands should all complete successfully, and create a newt266
executable file.When we run that file, we observe:
Indicating that our changes were successful.