Inverting a real-valued index grid

2019-02-06 16:19发布

问题:

OpenCV's remap() uses a real-valued index grid to sample a grid of values from an image using bilinear interpolation, and returns the grid of samples as a new image.

To be precise, let:

A = an image 
X = a grid of real-valued X coords into the image. 
Y = a grid of real-valued Y coords into the image.
B = remap(A, X, Y)

Then for all pixel coordinates i, j,

B[i, j] = A(X[i, j], Y[i, j]) 

Where the round-braces notation A(x, y) denotes using bilinear interpolation to solve for the pixel value of image A using float-valued coords x and y.

My question is: given an index grid X, Y, how can I generate an "inverse grid" X^-1, Y^-1 such that:

X(X^-1[i, j], Y^-1[i, j]) = i
Y(X^-1[i, j], Y^-1[i, j]) = j

And

X^-1(X[i, j], Y[i, j]) = i
Y^-1(X[i, j], Y[i, j]) = j

For all integer pixel coordinates i, j?

FWIW, the image and index maps X and Y are the same shape. However, there is no a priori structure to the index maps X and Y. For example, they're not necessarily affine or rigid transforms. They may even be uninvertible, e.g. if X, Y maps multiple pixels in A to the same exact pixel coordinate in B. I'm looking for ideas for a method that will find a reasonable inverse map if one exists.

The solution need not be OpenCV-based, as I'm not using OpenCV, but another library that has a remap() implementation. While any suggestions are welcome, I'm particularly keen on something that's "mathematically correct", i.e. if my map M is perfectly invertible, the method should find the perfect inverse, within some small margin of machine precision.

回答1:

Well I just had to solve this remap inversion problem myself and I'll outline my solution.

Given X, Y for the remap() function that does the following:

B[i, j] = A(X[i, j], Y[i, j])   

I computed Xinv, Yinv that can be used by the remap() function to invert the process:

A[x, y] = B(Xinv[x,y],Yinv[x,y])

First I build a KD-Tree for the 2D point set {(X[i,j],Y[i,j]} so I can efficiently find the N nearest neighbors to a given point (x,y). I use Euclidian distance for my distance metric. I found a great C++ header lib for KD-Trees on GitHub.

Then I loop thru all the (x,y) values in A's grid and find the N = 5 nearest neighbors {(X[i_k,j_k],Y[i_k,j_k]) | k = 0 .. N-1} in my point set.

  • If distance d_k == 0 for some k then Xinv[x,y] = i_k and Yinv[x,y] = j_k, otherwise...

  • Use Inverse Distance Weighting (IDW) to compute an interpolated value:

    • let weight w_k = 1 / pow(d_k, p) (I use p = 2)
    • Xinv[x,y] = (sum_k w_k * i_k)/(sum_k w_k)
    • Yinv[x,y] = (sum_k w_k * j_k)/(sum_k w_k)

Note that if B is a W x H image then X and Y are W x H arrays of floats. If A is a w x h image then Xinv and Yinv are w x h arrays for floats. It is important that you are consistent with image and map sizing.

Works like a charm! My first version I tried brute forcing the search and I never even waited for it to finish. I switched to a KD-Tree then I started to get reasonable run times. I f I ever get time I would like to add this to OpenCV.

The second image below is use remap() to remove the lens distortion from the first image. The third image is a result of inverting the process.



回答2:

There is no any standard way to do it with OpenCV.

If you are looking for a complete ready-to-use solution, I am not sure that I can help, but I can at least describe a method that I used some years ago to do this task.

First of all, you should create remapping maps with the same dimension as your source image. I created maps with larger dimensions for simpler interpolation, and at final step cropped them to proper size. Then you should fill them with values existing in previous remapping maps (not so difficult: just iterate over them and if maps coordinates x and y lays in limits of your image, take their row and column as new y and x, and place into old x and y column and row of the new map). It is rather simple solution,but it gives rather good result. For perfect one you should interpolate old x and y to integer values using your interpolation method and neighbour pixels.

After this you should either actually remap pixel colors manually, or completely fill your remapping map with pixel coordinates and use version from OpenCV.

You will meet rather challenging task: you should interpolate pixels in empty areas. In other words, you should take distances to closest non-zero pixel coordinates and mix color (if you remap colors) or coordinates (if you proceed with full maps computation) fractions according to these distances. Actually it is also not so difficult for linear interpolation, and you can even look into remap() implementation in OpenCV github page. For NN interpolation it will me much simpler - just take color/coordinate of nearest neighbour.

And a final task is extrapolation of areas out of borders of remapped pixels area. Also algorithm from OpenCV can be used as a reference.



回答3:

If you map is derived from a homography H you could invert H and directly create the inverse maps with cv::initUndistortRectifyMap().

e.g. in Python:

import numpy as np.
map_size = () # fill in your map size
H_inv = np.linalg.inv(H)
map1, map2 = cv2.initUndistortRectifyMap(cameraMatrix=np.eye(3), distCoeffs=np.zeros(5), R=H_inv, newCameraMatrix=np.eye(3), size=map_size, m1type=cv2.CV_32FC1)

The OpenCV documentation states about initUndistortRectifyMap():

The function actually builds the maps for the inverse mapping algorithm that is used by remap(). That is, for each pixel (u, v) in the destination image, the function computes the corresponding coordinates in the source image.

In the case you have just given the maps, you have to do it by yourself. Hoewever, interpolation of the new maps' coordinates is not trivial, because the support region for one pixel could be very large.

Here is a simple Python solution which inverts the maps by doing point-to-point mapping. This will probably leave some coordinates unassigned, while others will be updated several times. So there may be holes in the map.

Here is a small Python program demonstrating both approaches:

import cv2
import numpy as np


def invert_maps(map_x, map_y):
    assert(map_x.shape == map_y.shape)
    rows = map_x.shape[0]
    cols = map_x.shape[1]
    m_x = np.ones(map_x.shape, dtype=map_x.dtype) * -1
    m_y = np.ones(map_y.shape, dtype=map_y.dtype) * -1
    for i in range(rows):
        for j in range(cols):
            i_ = round(map_y[i, j])
            j_ = round(map_x[i, j])
            if 0 <= i_ < rows and 0 <= j_ < cols:
                m_x[i_, j_] = j
                m_y[i_, j_] = i
    return m_x, m_y


def main():
    img = cv2.imread("pigeon.png", cv2.IMREAD_GRAYSCALE)

    # a simply rotation by 45 degrees
    H = np.array([np.sin(np.pi/4), -np.cos(np.pi/4), 0, np.cos(np.pi/4), np.sin(np.pi/4), 0, 0, 0, 1]).reshape((3,3))
    H_inv = np.linalg.inv(H)
    map_size = (img.shape[1], img.shape[0])

    map1, map2 = cv2.initUndistortRectifyMap(cameraMatrix=np.eye(3), distCoeffs=np.zeros(5), R=H, newCameraMatrix=np.eye(3), size=map_size, m1type=cv2.CV_32FC1)
    map1_inv, map2_inv = cv2.initUndistortRectifyMap(cameraMatrix=np.eye(3), distCoeffs=np.zeros(5), R=H_inv, newCameraMatrix=np.eye(3), size=map_size, m1type=cv2.CV_32FC1)
    map1_simple_inv, map2_simple_inv = invert_maps(map1, map2)

    img1 = cv2.remap(src=img, map1=map1, map2=map2, interpolation=cv2.INTER_LINEAR)
    img2 = cv2.remap(src=img1, map1=map1_inv, map2=map2_inv, interpolation=cv2.INTER_LINEAR)
    img3 = cv2.remap(src=img1, map1=map1_simple_inv, map2=map2_simple_inv,
                               interpolation=cv2.INTER_LINEAR)

    cv2.imshow("Original image", img)
    cv2.imshow("Mapped image", img1)
    cv2.imshow("Mapping forth and back with H_inv", img2)
    cv2.imshow("Mapping forth and back with invert_maps()", img3)
    cv2.waitKey(0)


if __name__ == '__main__':
    main()


回答4:

From what I understand you have an original image, and a transformed image, and you wish to recover the nature of the transform that has been applied without knowing it, but assuming it is something sensible, like a rotation or a fish-eye distort.

What I would try is thresholding the image to convert it to binary, in both the index image and the plain image. Then try to identify objects. Most mappings will at least retain connectivity and Euler number, mostly the largest object in the index will still be the largest object in the plain.

Then take moments for your matched image / indexed pairs and see if you can remove translation, rotation and scaling. That gives you several reverse maps, which you can then try to stitch together. (Hard if the transform is not simple, but the general problem of reconstituting just any transformation cannot be solved).



回答5:

OP here. I think I've found an answer. I haven't implemented it yet, and if someone comes up with a less fiddly solution (or finds something wrong with this one), I'll choose their answer instead.

Problem statement

Let A be the source image, B be the destination image, and M be the mapping from A's coords to B's coords, i.e.:

B[k, l, :] == A(M[k, l, 0], M[k, l, 1], :) 
for all k, l in B's coords.

...where square braces indicate array lookup with integer indices, and circular braces indicate bilinear interpolation lookup with floating-point indices. We restate the above using the more economical notation:

B = A(M)

We wish to find an inverse mapping N that maps B back to A as best as is possible:

Find N s.t. A \approx B(N)

The problem can be stated without reference to A or B:

Find N = argmin_N || M(N) - I_n ||

...where ||*|| indicates the Frobenius norm, and I_n is the identity map with the same dimensions as N, i.e. a map where:

I_n[i, j, :] == [i, j]
for all i, j

Naive solution

If M's values are all integers, and M is an isomorphism, then you can construct N directly as:

N[M[k, l, 0], M[k, l, 1], :] = [k, l]
for all k, l

Or in our simplified notation:

N[M] = I_m

...where I_m is the identity map with the same dimensions as M.

There are two problems:

  1. M is not an isomorphism, so the above will leave "holes" in N at N[i, j, :] for any [i, j] not among the values in M.
  2. M's values are floating-point coordinates [i, j], not integer coordinates. We cannot simply assign a value to the bilinearly-interpolated quantity N(i, j, :), for float-valued i, j. To achieve the equivalent effect, we must instead set the values of [i, j]'s four surrounding corners N[floor(i), floor(j), :], N[floor(i), ceil(j), :], N[ceil(i), floor(j), :], N[ceil(i), ceil(j), :] such that the interpolated value N(i, j, :) equals the desired value [k, l], for all pixel mappings [i, j] --> [k, l] in M.

Solution

Construct empty N as a 3D tensor of floats:

N = zeros(size=(A.shape[0], A.shape[1], 2))

For each coordinate [i, j] in A's coordinate space, do:

  1. Find the 2x2 grid of A-coordinates in M that [i, j] lies within. Compute the homography matrix H that maps those A-coordinates to their corresponding B-coordinates (given by the 2x2 grid's pixel indices).
  2. Set N[i, j, :] = matmul(H, [i, j])

The potentially expensive step here would be the search in step 1 for the 2x2 grid of A-coordinates in M that encircles [i, j]. A brute-force search would make this whole algorithm O(n*m) where n is the number of pixels in A, and m the number of pixels in B.

To reduce this to O(n), one could instead run a scanline algorithm within each A-coordinate quadrilateral to identify all the integer-valued coordinates [i, j] it contains. This could be precomputed as a hashmap that maps integer-valued A coords [i, j] to the upper-left corner of its encircling quadrilateral's B coords [k, l].