Graph cut using Matlab

2019-05-28 08:01发布

问题:

I am trying to apply graph cut method for my segmentation task. I found some example codes at Graph_Cut_Demo.

Part of the codes are showing below

img = im2double( imread([ImageDir 'cat.jpg']) ); 
[ny,nx,nc] = size(img); 
d = reshape( img, ny*nx, nc );  
k = 2; % number of clusters 
[l0 c] = kmeans( d, k ); 
l0 = reshape( l0, ny, nx ); 

% For each class, the data term Dc measures the distance of 
% each pixel value to the class prototype. For simplicity, standard 
% Euclidean distance is used. Mahalanobis distance (weighted by class 
% covariances) might improve the results in some cases.  Note that the 
% image intensity values are in the [0,1] interval, which provides 
% normalization.   

Dc = zeros( ny, nx, k ); 
for i = 1:k 
  dif = d - repmat( c(i,:), ny*nx,1 ); 
  Dc(:,:,i) = reshape( sum(dif.^2,2), ny, nx ); 
end 

It seems that the method used k-means clustering to initialise the graph and get the data term Dc. However, I don't understand how they calculate this data term. Why they use

dif = d - repmat( c(i,:), ny*nx,1 ); 

In the comments thy said the data term Dc measures the distance of each pixel value to the class prototype. What is the class prototype, and why it can be determined by k-means label?

In another implementation Graph_Cut_Demo2, it used

% calculate the data cost per cluster center
Dc = zeros([sz(1:2) k],'single');
for ci=1:k
    % use covariance matrix per cluster
    icv = inv(cov(d(l0==ci,:)));    
    dif = d- repmat(c(ci,:), [size(d,1) 1]);
    % data cost is minus log likelihood of the pixel to belong to each
    % cluster according to its RGB value
    Dc(:,:,ci) = reshape(sum((dif*icv).*dif./2,2),sz(1:2));
end

This confused me a lot. Why they calculate the covariance matrix and how they formed the data term using minus log likelihood? Any papers or descriptions available for these implementation?

Thanks a lot.

回答1:

Both graph-cut segmentation examples are strongly related. The authors of Image Processing, Analysis, and Machine Vision: A MATLAB Companion book (first example) used the graph cut wrapper code of Shai Bagon (with the author's permission naturally) - the second example.

So, what is the data term anyway?
The data term represent how each pixel independently is likely to belong to each label. This is why log-likelihood terms are used.

More concretely, in these examples you try to segment the image into k segments based on their colors. You assume there are only k dominant colors in the image (not a very practical assumption, but sufficient for educational purposes). Using k-means you try and find what are these two colors. The output of k-means is k centers in RGB space - that is k "representative" colors. The likelihood of each pixel to belong to any of the k centers is inversely proportional to the distance (in color space) of the pixel from the representative k-th center: the larger the distance the less likely the pixel to belong to the k-th center, the higher the unary energy penalty one must "pay" to assign this pixel to the k-th cluster.
The second example takes this notion one step ahead and assumes that the k clusters may have different densities in color space, modeling this second order behavior using a covariance matrix for each cluster.

In practice, one uses a more sophisticated color model for each segment, usually a mixture of Gaussians. You can read about it in the seminal paper of GrabCut (section 3).

PS,
Next time you can email Shai Bagon directly and ask.