IPython engines returning different results

2019-08-29 04:33发布

问题:

Hopefully someone can enlighten me without me having to post a lot of confusing code.

I am using IPython.parallel to process neural networks. In an attempt to find a bug I decided to send the same network out to each client with the same input data. I would expect to have each client return the same answer, which most of the time, it does. However sometimes I will get vastly different results from each client.

here's just a sample of running the code 5 different times. Each time the code is run a new network is built and so I would expect a different solution between runs, however, each run I am sending out the same network to each client...

I am using apply_async to send out the process to the different clients (all of which are on a single local machine.)

no random numbers are being generated during the processing and the only math functions I am using are the built in pow() and numpy.tanh().

any ideas on the best way to track down what's going on?

%> ./ld_cluster.py
Available workers:  5
importing sys on engine(s)
0 ) b0ca598b-8cc8-4de7-8e6e-f62c1e6eba58 :: 2.020202
0 ) b0ca598b-8cc8-4de7-8e6e-f62c1e6eba58 :: 2.020202
0 ) b0ca598b-8cc8-4de7-8e6e-f62c1e6eba58 :: 2.020202
0 ) b0ca598b-8cc8-4de7-8e6e-f62c1e6eba58 :: 2.020202
0 ) b0ca598b-8cc8-4de7-8e6e-f62c1e6eba58 :: 2.020202
%> ./ld_cluster.py
Available workers:  5
importing sys on engine(s)
0 ) ff0ac798-3eb9-43cd-940a-6bc77447a3b4 :: 1.846979
0 ) ff0ac798-3eb9-43cd-940a-6bc77447a3b4 :: 1.846979
0 ) ff0ac798-3eb9-43cd-940a-6bc77447a3b4 :: 1.846979
0 ) ff0ac798-3eb9-43cd-940a-6bc77447a3b4 :: 1.846979
0 ) ff0ac798-3eb9-43cd-940a-6bc77447a3b4 :: 1.846979
%> ./ld_cluster.py
Available workers:  5
importing sys on engine(s)
0 ) f679d9c3-9e00-4b32-84b7-72fcf9fb5da0 :: 2.021491
0 ) f679d9c3-9e00-4b32-84b7-72fcf9fb5da0 :: 2.021491
0 ) f679d9c3-9e00-4b32-84b7-72fcf9fb5da0 :: 2.021491
0 ) f679d9c3-9e00-4b32-84b7-72fcf9fb5da0 :: 2.021491
0 ) f679d9c3-9e00-4b32-84b7-72fcf9fb5da0 :: 2.021491
%> ./ld_cluster.py
Available workers:  5
importing sys on engine(s)
0 ) f28617ef-72e7-4de3-a0a7-a98057efaa2e :: 1.979795
0 ) f28617ef-72e7-4de3-a0a7-a98057efaa2e :: 1.979795
0 ) f28617ef-72e7-4de3-a0a7-a98057efaa2e :: 1.979795
0 ) f28617ef-72e7-4de3-a0a7-a98057efaa2e :: 1.979795
0 ) f28617ef-72e7-4de3-a0a7-a98057efaa2e :: 1.979795
%> ./ld_cluster.py
Available workers:  5
importing sys on engine(s)
0 ) dd635626-4881-4470-909f-6a2fbe73b06d :: 2.020196
0 ) dd635626-4881-4470-909f-6a2fbe73b06d :: 1.991076
0 ) dd635626-4881-4470-909f-6a2fbe73b06d :: 1.952310
0 ) dd635626-4881-4470-909f-6a2fbe73b06d :: 1.887462
0 ) dd635626-4881-4470-909f-6a2fbe73b06d :: 2.019929

回答1:

Thought I should show the answer (if it can be called that) here as well...

Please see THIS THREAD.