多GPU分析（多个CPU，MPI / CUDA混合）(Multi-GPU profiling (Se

我在论坛上咋一看，我不认为这个问题已经被问了。

我目前有MPI / CUDA混合代码，由他的博士生在别人做的工作。每个CPU都有自己的GPU。我的任务是通过运行（已工作）代码来收集数据，并实现额外的东西。打开此代码到一个单一的CPU /多GPU一个不是此刻一个选项（以后，可能）。

我想利用性能分析工具来分析整个事情。

现在的想法是让每个CPU推出nvvp为自己的GPU和收集数据，而另一种分析工具会照顾一般的CPU / MPI部分（我打算使用TAU，因为我通常做）的。

问题是，推出nvvp的接口8倍的同时（如果有8个CPU / GPU上运行）是非常恼人。我想，以避免通过接口去，并得到一种在文件中直接写入数据的命令行，那我以后可以喂到nvvc的界面和分析。

我想获得将由每个CPU执行，并且会产生对他们每个人的文件，提供有关自己的GPU数据的命令行。图8（的GPU / CPU的）= 8页的文件。然后，我打算单独喂养，并通过一个分析与NVCC一个这些文件，手动比较数据。

任何的想法？

谢谢！

Answer 1:

看看nvprof ，该部分CUDA 5.0工具包（目前作为一个候选发布版）。有一些限制 - 它只能收集在一个给定的通柜的数量有限，而且它不能收集指标（所以现在你必须脚本发射多，如果你想比一些事件的更多）。你可以从nvvp内置帮助的更多信息，其中包括一个示例MPI启动脚本（复制在这里，但我建议你检查出nvvp帮助向上最新版本，如果你有超过5.0 RC任何更新）。

#!/bin/sh
#
# Script to launch nvprof on an MPI process.  This script will
# create unique output file names based on the rank of the 
# process.  Examples:
#   mpirun -np 4 nvprof-script a.out 
#   mpirun -np 4 nvprof-script -o outfile a.out
#   mpirun -np 4 nvprof-script test/a.out -g -j
# In the case you want to pass a -o or -h flag to the a.out, you
# can do this.
#   mpirun -np 4 nvprof-script -c a.out -h -o
# You can also pass in arguments to nvprof
#   mpirun -np 4 nvprof-script --print-api-trace a.out
#

usage () {
 echo "nvprof-script [nvprof options] [-h] [-o outfile] a.out [a.out options]";
 echo "or"
 echo "nvprof-script [nvprof options] [-h] [-o outfile] -c a.out [a.out options]";
}

nvprof_args=""
while [ $# -gt 0 ];
do
    case "$1" in
        (-o) shift; outfile="$1";;
        (-c) shift; break;;
        (-h) usage; exit 1;;
        (*) nvprof_args="$nvprof_args $1";;
    esac
    shift
done

# If user did not provide output filename then create one
if [ -z $outfile ] ; then
    outfile=`basename $1`.nvprof-out
fi

# Find the rank of the process from the MPI rank environment variable
# to ensure unique output filenames.  The script handles Open MPI
# and MVAPICH.  If your implementation is different, you will need to
# make a change here.

# Open MPI
if [ ! -z ${OMPI_COMM_WORLD_RANK} ] ; then
    rank=${OMPI_COMM_WORLD_RANK}
fi
# MVAPICH
if [ ! -z ${MV2_COMM_WORLD_RANK} ] ; then
    rank=${MV2_COMM_WORLD_RANK}
fi

# Set the nvprof command and arguments.
NVPROF="nvprof --output-profile $outfile.$rank $nvprof_args" 
exec $NVPROF $*

# If you want to limit which ranks get profiled, do something like
# this. You have to use the -c switch to get the right behavior.
# mpirun -np 2 nvprof-script --print-api-trace -c a.out -q  
# if [ $rank -le 0 ]; then
#     exec $NVPROF $*
# else
#     exec $*
# fi

Answer 2:

另一种选择是因为你已经在使用TAU分析的应用程序的CPU方面，你也可以使用TAU收集GPU的性能数据。 TAU支持多GPU的执行与MPI一起，看看http://www.nic.uoregon.edu/tau-wiki/Guide:TAUGPU关于如何开始使用TAU的GPU分析capabilites开始指令。 TAU使用CUPTI（CUDA性能工具接口）的下方，因此这些数据，您将能够与TAU收集将是非常相似的，以能与nVidia的视觉探查器收集的。

Answer 3:

因为CUDA 5.0事情已经改变，现在我们可以简单地用%h ， %p和%q{ENV}提到这里使用一个包装脚本代替：

$ mpirun -np 2 -host c0-0,c0-1 nvprof -o output.%h.%p.%q{OMPI_COMM_WORLD_RANK} ./my_mpi_app

文章来源: Multi-GPU profiling (Several CPUs , MPI/CUDA Hybrid)