PyTorch Tutorial Error Training a Classifier

2019-03-05 10:45发布

问题:

I just started the PyTorch-Tutorial Deep Learning with PyTorch: A 60 Minute Blitz and I should add, that I haven't programmed any python (but other languages like Java) before.

Right now, my Code looks like

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np


print("\n-------------------Backpropagation-------------------\n")
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

dataiter = iter(trainloader)
images, labels = dataiter.next()


def imshow(img):
    img = img / 2 + 0.5
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


imshow(torchvision.utils.make_grid(images))

print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

which should be consistent with the tutorial. If I execute this, I'll get the following error:

"C:\Program Files\Anaconda3\python.exe" C:/MA/pytorch/deepLearningWithPytorchTutorial/trainingClassifier.py

-------------------Backpropagation-------------------

Files already downloaded and verified
Files already downloaded and verified

-------------------Backpropagation-------------------

Files already downloaded and verified
Files already downloaded and verified
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\Program Files\Anaconda3\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Program Files\Anaconda3\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Program Files\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\MA\pytorch\deepLearningWithPytorchTutorial\trainingClassifier.py", line 23, in <module>
    dataiter = iter(trainloader)
  File "C:\Program Files\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 451, in __iter__
    return _DataLoaderIter(self)
  File "C:\Program Files\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 239, in __init__
    w.start()
  File "C:\Program Files\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "C:/MA/pytorch/deepLearningWithPytorchTutorial/trainingClassifier.py", line 23, in <module>
    dataiter = iter(trainloader)
  File "C:\Program Files\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 451, in __iter__
    return _DataLoaderIter(self)
  File "C:\Program Files\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 239, in __init__
    w.start()
  File "C:\Program Files\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

Process finished with exit code 1

I already downloaded the *.py and *.ipynb. Running the *.ipynb with jupyter works fine (but I don't want to programm in the juniper web-interface, I prefer pyCharm) while the *.py in the console (Anaconda prompt and cmd) fails with the same error.

Does anyone know how to fix this? (I'm using Python 3.6.5 (from Anaconda) and pyCharm, OS: Win10 64-bit)

Thanks! Bene

Update: If it is relevant, I just set num_workers=2 to num_workers=0 (both) and then it'll work.. .

回答1:

Check out the documentation for multiprocessing: programming guidelines for windows. You should wrap all operations in functions and then call them inside an if __name__ == '__main__' clause:

# required imports

def load_datasets(...):
    # Code to load the datasets with multiple workers

def train(...):
    # Code to train the model

if __name__ == '__main__':
    load_datasets()
    train()

In short, the the idea here is to wrap the example code inside an if __name__ == '__main__' statement.



回答2:

Because of different implementation of multiprocessing in Windows, you need to wrap your main code with this block:

if __name__ == '__main__':

For more info, you can check the official PyTorch Windows notes.