I am trying to merge two images using VNImageHomographicAlignmentObservation
, I am currently getting a 3d matrix that looks like this:
simd_float3x3([ [0.99229, -0.00451023, -4.32607e-07)],
[0.00431724,0.993118, 2.38839e-07)],
[-72.2425, -67.9966, 0.999288)]], )
But I don't know how to use these values to merge into one image. There doesn't seem to be any documentation on what these values even mean. I found some information on transformation matrices here: Working with matrices.
But so far nothing else has helped me... Any suggestions?
My Code:
func setup() {
let floatingImage = UIImage(named:"DJI_0333")!
let referenceImage = UIImage(named: "DJI_0327")!
let request = VNHomographicImageRegistrationRequest(targetedCGImage: floatingImage.cgImage!, options: [:])
let handler = VNSequenceRequestHandler()
try! handler.perform([request], on: referenceImage.cgImage!)
if let results = request.results as? [VNImageHomographicAlignmentObservation] {
print("Perspective warp found: \(results.count)")
results.forEach { observation in
// A matrix with 3 rows and 3 columns.
let matrix = observation.warpTransform
print(matrix) }
}
}
This homography matrix
H
describes how to project one of your images onto the image plane of the other image. To transform each pixel to its projected location, you can to compute its projected locationx' = H * x
using homogeneous coordinates (basically take your 2D image coordinate, add a 1.0 as third component, apply the matrixH
, and go back to 2D by dividing through the 3rd component of the result).The most efficient way to do this for every pixel, is to write this matrix multiplication in homogeneous space using CoreImage. CoreImage offers multiple shader kernel types:
CIColorKernel
,CIWarpKernel
andCIKernel
. For this task, we only want to transform the location of each pixel, so aCIWarpKernel
is what you need. Using the Core Image Shading Language, that would look as follows:Note that the shader wants a
mat3
calledhomography
, which is the shading language equivalent of thesimd_float3x3
matrixH
. When calling the shader, the matrix is expected to be stored in a CIVector, to transform it use:When you apply the
CIWarpKernel
to an image, you have to tell CoreImage how big the output should be. To merge the warped and reference image, the output should be big enough to cover the whole projected and original image. We can compute the size of the projected image by applying the homography to each corner of the image rect (this time in Swift, CoreImage calls this rect the extent):Now you can create a warped version of your floating image:
The
roiCallback
is there to tell CoreImage which part of the input image is needed to compute a certain part of the output. CoreImage uses this to apply the shader on parts of the image block by block, such that it can process huge images. (See Creating Custom Filters in Apple's docs). A quick hack would be to alwaysreturn CGRect.infinite
here, but then CoreImage can't do any block-wise magic.And lastly, create a composite image of the reference image and the warped image: