Using SPMD to run a series of jobs while updating

2019-09-03 04:56发布

I am currently trying to run experiments in parallel using MATLAB 2013b that are very time-consuming.

One strategy to speed things up is to use the results from one experiment to "warm start" the next experiment. In my case, this is a little complicated because each experiment has one of n_types types, and I can only use an experiment of type k to speed up another experiment of type k.

Unfortunately, I cannot implement this strategy with the parfor function because it would require each job to update a common variable (which stores the warm start information). That said, I have heard that it might be possible to do this using the spmd framework.

I am wondering if someone could help me 'translate' the following block of generic (non-working) parfor code into something that will work in the spmd code.

n_cores = %provided by user (# of workers that are available)
inputs  = %provided by user (n_jobs x 1 cell array of structs)
types   = %provided by user (n_types x 1 array of integer values)
n_jobs  = length(inputs)
n_types = length(unique(types))

outputs     = cell(n_jobs,1) %cell array to store job output
warm_starts = cell(0,n_types) %empty 0 x n_type cell array to store warm start data

matlabpool('open',n_cores)

parfor i = 1:length(jobs)

   %run myfun in parallel
   outputs{i} = myfun(inputs{i},warm_starts(types(i)));

   %update warm start data for experiments of this type with data from current experiment
   warm_starts{end+1,types(i)) = get_warm_start(job_outputs{i});

end

1条回答
看我几分像从前
2楼-- · 2019-09-03 05:26

It's not quite clear to me how many different warm_starts you might want to store for each type. I'm going to assume you want to store just 1. Here's how you might do that:

jobs  = rand(1,97); % note prime number of jobs
types = randi([1, 5], size(jobs));
n_jobs = numel(jobs);
n_types = numel(unique(types));
warm_starts = cell(1, n_types);

spmd
    jobs_per_lab = ceil(n_jobs / numlabs);
    outputs = cell(jobs_per_lab, 1);
    for idx = 1:jobs_per_lab
        job_idx = idx + ((labindex-1)*jobs_per_lab);
        if job_idx > n_jobs
            % Off the end of 'jobs', no work to do
            this_warm_start = NaN;
            this_type       = NaN;
        else
            this_type = types(job_idx);
            if ~isempty(warm_starts{this_type})
                this_warm_start = warm_starts{this_type};
            else
                this_warm_start = 0;
            end
            outputs{idx} = this_warm_start + types(job_idx) * jobs(job_idx); % some function goes here
            this_warm_start = rand();
        end
        % All-to-all communication to exchange 'this_warm_start' values.
        % After this, each worker has a 2 x numlabs cell array of warm starts and types
        all_warm_starts_this_round = gcat({this_type; this_warm_start}, 2);
        for w = 1:numlabs
            warm_start_type = all_warm_starts_this_round{1, w};
            warm_start_value = all_warm_starts_this_round{2, w};
            if ~isnan(warm_start_type)
                warm_starts{warm_start_type} = warm_start_value;
            end
        end
    end
    % Finally, collect all results on lab 1
    outputs = gcat(outputs, 1, 1);
end
% Dereference the Composite
outputs = outputs{1};

The main things I've done there are to manually split the work up so that each worker operates on a chunk of the 'jobs', and then use GCAT to broadcast warm start information after each round.

查看更多
登录 后发表回答