TensorFlow fails to compile

2020-07-14 12:21发布

问题:

Getting the following errors which trying to compile TensorFlow from source. Any thoughts would be helpful.

bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasGemmEx@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasZhpmv_v2@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cufftExecD2Z@libcufft.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasSrotg_v2@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cufftExecR2C@libcufft.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasSsyrk_v2@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasDgemm_v2@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cufftSetWorkArea@libcufft.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasChemm_v2@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasZher2k_v2@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cufftExecC2C@libcufft.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `curandSetStream@libcurand.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasDrotm_v2@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `curandSetPseudoRandomGeneratorSeed@libcurand.so.9.0'

回答1:

There seems to be a bug in our build. I was able to reproduce the same on my machine. Looks like the value of LD_LIBRARY_PATH does not always get properly propagated during bazel build. In my case, I was able to successfully build when I used this command:

bazel build  --config=opt --config=cuda tensorflow/tools/pip_package:build_pip_package --action_env="LD_LIBRARY_PATH=${LD_LIBRARY_PATH}"


回答2:

I ran into the same error just yesterday while trying to build tensorflow from source against an apparently valid cuda 9.0. In my case, no combinations of git clean and action_env helped - ld via bazel would consistently refuse to acknowledge the cuda libs.

I ended up following the instructions in this thread: As root, create a file /etc/ld.so.conf.d/cuda.conf with the one line

/usr/local/cuda/lib64

(Assuming your /usr/local/cuda/ is linked to your concrete cuda directory, e.g., /usr/local/cuda-9.0/.)

Then issue sudo ldconfig. With that, the build ran through, and tensorflow is using my GPU.



回答3:

In an attempt to make this problem easier to search for: The error message I got also included at the top:

libcublas.so.9.0, needed by bazel-out/[...]/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)

and so on for libcudnn etc.

When I encountered this problem, I first added /usr/local/cuda/lib64 and /usr/local/cuda/extras/CUPTI/lib64 to my LD_LIBRARY_PATH and tried to rebuild (without --action_env). Didn't work.

I then did a clean reconfigure and build, again without --action_env, and it worked. I cleaned my repository by way of git clean -xdf, which, caution, will nuke all files in your repository that are not known to git. :)

Maybe --action_env would have obviated the need to do a clean rebuild, I dunno. But if the libraries were in your LD_LIBRARY_PATH before doing the very first build, I expect that you wouldn't need --action_env.



回答4:

I appended /usr/local/cuda/lib64 to LD_LIBRARY_PATH AFTER the errors occurred. It didn't work. Then I modified .tf_configure.bazelrc with build --action_env LD_LIBRARY_PATH=..." again. Re-compile the project and pass!