Whats an appropriate way to profile parallel code in julia? When I run
@profile foo(...)
where foo is my function, I get
julia> Profile.print()
1234 task.jl; anonymous; line: 23
4 multi.jl; remotecall_fetch; line: 695
2 multi.jl; send_msg_; line: 172
2 serialize.jl; serialize; line: 74
2 serialize.jl; serialize; line: 299
2 serialize.jl; serialize; line: 130
2 serialize.jl; serialize; line: 299
1 dict.jl; serialize; line: 369
1 serialize.jl; serialize_type; line: 278
1 serialize.jl; serialize; line: 199
1 serialize.jl; serialize; line: 227
1 serialize.jl; serialize; line: 160
1 serialize.jl; serialize; line: 160
1 serialize.jl; serialize; line: 299
1 serialize.jl; serialize; line: 294
1 io.jl; write; line: 47
1 ./iobuffer.jl; write; line: 234
1 ./iobuffer.jl; ensureroom; line: 151
1 ./array.jl; resize!; line: 503
2 multi.jl; send_msg_; line: 178
2 stream.jl; write; line: 724
1230 multi.jl; remotecall_fetch; line: 696
1230 ./multi.jl; wait_full; line: 595
1230 ./task.jl; wait; line: 189
1229 ./task.jl; wait; line: 269
1229 ./stream.jl; process_events; line: 529
1 ./task.jl; wait; line: 282
1 ./stream.jl; process_events; line: 529
402 task.jl; anonymous; line: 95
402 REPL.jl; eval_user_input; line: 53
401 profile.jl; anonymous; line: 14
401 ...mba/src/model/mcmc.jl; mcmc; line: 314
401 ./task.jl; sync_end; line: 306
401 task.jl; wait; line: 48
401 ./task.jl; wait; line: 189
401 ./task.jl; wait; line: 269
401 ./stream.jl; process_events; line: 529
1 profile.jl; anonymous; line: 16
217 task.jl; task_done_hook; line: 83
217 ./task.jl; wait; line: 269
217 ./stream.jl; process_events; line: 529
You may try VTune Amplifier (https://software.intel.com/en-us/intel-vtune-amplifier-xe) to profile Julia code at function level as described at https://software.intel.com/en-us/blogs/2013/10/10/profiling-julia-code-with-intel-vtune-amplifier. You may also need to apply a patch to LLVM (https://gist.github.com/ArchRobison/d3601433d160b05ed5ee) to workaround source level performance information bug to get correct data at a source line level
Don't know if it will work, but you might want to consider having the workers call a function that looks something like this:
pdata
should contain the profiling data from that worker. You should be able to view it in ProfileView.