I have access to Tesla K20c, I am running ResNet50 on CIFAR10 dataset...
Then I get the error as: THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu line=265 error=59 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 109, in <module>
train(loader_train, model, criterion, optimizer)
File "main.py", line 54, in train
optimizer.step()
File "/usr/local/anaconda35/lib/python3.6/site-packages/torch/optim/sgd.py", line 93, in step
d_p.add_(weight_decay, p.data)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:265
How to resolve this error
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
In general, when encountering cuda runtine error
s, it is advisable to run your program again using the CUDA_LAUNCH_BLOCKING=1
flag to obtain an accurate stack trace.
In your specific case, the targets of your data were too high (or low) for the specified number of classes.
回答2:
I have encountered this problem several times. And I find it to be an index issue. For example, if your ground truth label starts at 1: target = [1,2,3,4,5], then you should subtract 1 for every label, change it to: [0,1,2,3,4]. This solves my problem every time.