I am trying to merge two images using VNImageHomographicAlignmentObservation
, I am currently getting a 3d matrix that looks like this:
simd_float3x3([ [0.99229, -0.00451023, -4.32607e-07)],
[0.00431724,0.993118, 2.38839e-07)],
[-72.2425, -67.9966, 0.999288)]], )
But I don't know how to use these values to merge into one image. There doesn't seem to be any documentation on what these values even mean. I found some information on transformation matrices here: Working with matrices.
But so far nothing else has helped me... Any suggestions?
My Code:
func setup() {
let floatingImage = UIImage(named:"DJI_0333")!
let referenceImage = UIImage(named: "DJI_0327")!
let request = VNHomographicImageRegistrationRequest(targetedCGImage: floatingImage.cgImage!, options: [:])
let handler = VNSequenceRequestHandler()
try! handler.perform([request], on: referenceImage.cgImage!)
if let results = request.results as? [VNImageHomographicAlignmentObservation] {
print("Perspective warp found: \(results.count)")
results.forEach { observation in
// A matrix with 3 rows and 3 columns.
let matrix = observation.warpTransform
print(matrix) }
}
}
This homography matrix H
describes how to project one of your images onto the image plane of the other image. To transform each pixel to its projected location, you can to compute its projected location x' = H * x
using homogeneous coordinates (basically take your 2D image coordinate, add a 1.0 as third component, apply the matrix H
, and go back to 2D by dividing through the 3rd component of the result).
The most efficient way to do this for every pixel, is to write this matrix multiplication in homogeneous space using CoreImage. CoreImage offers multiple shader kernel types: CIColorKernel
, CIWarpKernel
and CIKernel
. For this task, we only want to transform the location of each pixel, so a CIWarpKernel
is what you need. Using the Core Image Shading Language, that would look as follows:
import CoreImage
let warpKernel = CIWarpKernel(source:
"""
kernel vec2 warp(mat3 homography)
{
vec3 homogen_in = vec3(destCoord().x, destCoord().y, 1.0); // create homogeneous coord
vec3 homogen_out = homography * homogen_in; // transform by homography
return homogen_out.xy / homogen_out.z; // back to normal 2D coordinate
}
"""
)
Note that the shader wants a mat3
called homography
, which is the shading language equivalent of the simd_float3x3
matrix H
. When calling the shader, the matrix is expected to be stored in a CIVector, to transform it use:
let (col0, col1, col2) = yourHomography.columns
let homographyCIVector = CIVector(values:[CGFloat(col0.x), CGFloat(col0.y), CGFloat(col0.z),
CGFloat(col1.x), CGFloat(col1.y), CGFloat(col1.z),
CGFloat(col2.x), CGFloat(col2.y), CGFloat(col2.z)], count: 9)
When you apply the CIWarpKernel
to an image, you have to tell CoreImage how big the output should be. To merge the warped and reference image, the output should be big enough to cover the whole projected and original image. We can compute the size of the projected image by applying the homography to each corner of the image rect (this time in Swift, CoreImage calls this rect the extent):
/**
* Convert a 2D point to a homogeneous coordinate, transform by the provided homography,
* and convert back to a non-homogeneous 2D point.
*/
func transform(_ point:CGPoint, by homography:matrix_float3x3) -> CGPoint
{
let inputPoint = float3(Float(point.x), Float(point.y), 1.0)
var outputPoint = homography * inputPoint
outputPoint /= outputPoint.z
return CGPoint(x:CGFloat(outputPoint.x), y:CGFloat(outputPoint.y))
}
func computeExtentAfterTransforming(_ extent:CGRect, with homography:matrix_float3x3) -> CGRect
{
let points = [transform(extent.origin, by: homography),
transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y), by: homography),
transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y + extent.height), by: homography),
transform(CGPoint(x: extent.origin.x, y:extent.origin.y + extent.height), by: homography)]
var (xmin, xmax, ymin, ymax) = (points[0].x, points[0].x, points[0].y, points[0].y)
points.forEach { p in
xmin = min(xmin, p.x)
xmax = max(xmax, p.x)
ymin = min(ymin, p.y)
ymax = max(ymax, p.y)
}
let result = CGRect(x: xmin, y:ymin, width: xmax-xmin, height: ymax-ymin)
return result
}
let warpedExtent = computeExtentAfterTransforming(ciFloatingImage.extent, with: homography.inverse)
let outputExtent = warpedExtent.union(ciFloatingImage.extent)
Now you can create a warped version of your floating image:
let ciFloatingImage = CIImage(image: floatingImage)
let ciWarpedImage = warpKernel.apply(extent: outputExtent, roiCallback:
{
(index, rect) in
return computeExtentAfterTransforming(rect, with: homography.inverse)
},
image: inputImage,
arguments: [homographyCIVector])!
The roiCallback
is there to tell CoreImage which part of the input image is needed to compute a certain part of the output. CoreImage uses this to apply the shader on parts of the image block by block, such that it can process huge images. (See Creating Custom Filters in Apple's docs). A quick hack would be to always return CGRect.infinite
here, but then CoreImage can't do any block-wise magic.
And lastly, create a composite image of the reference image and the warped image:
let ciReferenceImage = CIImage(image: referenceImage)
let ciResultImage = ciWarpedImage.composited(over: ciReferenceImage)
let resultImage = UIImage(ciImage: ciResultImage)