Fastest way to find rows without NaNs in Matlab

I would like to find the indexes of rows without any NaN in the fastest way possible since I need to do it thousands of times. So far I have tried the following two approaches:

find(~isnan(sum(data, 2)));
find(all(~isnan(data), 2));

Is there a clever way to speed this up or is this the best possible? The dimension of the data matrix is usually thousands by hundreds.

标签： performance matlab matrix vectorization

4条回答

Fickle 薄情

2楼-- · 2019-06-17 19:07

If the nan density is high enough, then a double loop will be the fastest method. This is because the search of a row can be discarded as soon as the first nan is found. For example, consider the following speed test:

%# Preallocate some parameters
T = 5000; %# Number of rows
N = 500; %# Number of columns
X = randi(5, T, N); %# Sample data matrix
M = 100; %# Number of simulation iterations
X(X == 1) = nan; %# Randomly set some elements of X to nan

%# Your first method
tic
for m = 1:M
    Soln1 = find(~isnan(sum(X, 2)));
end
toc

%# Your second method
tic
for m = 1:M
    Soln2 = find(all(~isnan(X), 2));
end
toc

%# A double loop
tic
for m = 1:M
    Soln3 = ones(T, 1);
    for t = 1:T
        for n = 1:N
            if isnan(X(t, n))
                Soln3(t) = 0;
                break
            end
        end
    end
    Soln3 = find(Soln3);
end
toc

The results are:

Elapsed time is 0.164880 seconds.
Elapsed time is 0.218950 seconds.
Elapsed time is 0.068168 seconds. %# The double loop method

Of course, the nan density is so high in this simulation that none of the rows are nan free. But you never said anything about the nan density of your matrix, so I figured I'd post this answer for general consumption and contemplation :-)

0人赞添加讨论(0) 举报

Rolldiameter

3楼-- · 2019-06-17 19:13

Can you tell more about what you want to do with the indices

time = cputime;  
    A = rand(1000,100);              % Some matrix data
    for i = 1:100  
        A(randi(20,1,100)) = NaN;    % Randomly assigned NaN  
        B = isnan(A);                % B has 0 and 1  
        C = A(B == 0);               % C has all ~NaN elements
        ind(i,:) = find(B == 1);     % ind has all NaN indices
    end
    disp(cputime-time)

for 100 times in a loop, 0.1404 sec

0人赞添加讨论(0) 举报

三岁会撩人

4楼-- · 2019-06-17 19:16

Edit: matrix multiplication can be faster than sum, so the operation is almost twice faster for matrices above 500 x500 elements (in my Matlab 2012a machine). So my solution is:

find(~isnan(data*zeros(size(data,2),1)))

Out of the two methods you suggested (denoted f and g) in the question the first is faster (using timeit):

data=rand(4000);
nani=randi(numel(data),1,500);
data(nani)=NaN;
f= @() find(~isnan(sum(data, 2)));
g= @() find(all(~isnan(data), 2));
h= @() find(~isnan(data*zeros(size(data,2),1)));

timeit(f) 
ans =
     0.0263

timeit(g)
ans =
     0.1489

timeit(h)
ans =
     0.0146

0人赞添加讨论(0) 举报

傲

5楼-- · 2019-06-17 19:31

any() is faster than all() or sum(). try:

idx = find(~any(isnan(data), 2));

correction: it seems that sum() approach is faster:

idx = find(~isnan(sum(data, 2)));

0人赞添加讨论(0) 举报

Fastest way to find rows without NaNs in Matlab

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间