Merge images using “VNImageHomographicAlignmentObs

I am trying to merge two images using VNImageHomographicAlignmentObservation, I am currently getting a 3d matrix that looks like this:

simd_float3x3([ [0.99229, -0.00451023, -4.32607e-07)],  
                [0.00431724,0.993118, 2.38839e-07)],   
                [-72.2425, -67.9966, 0.999288)]], )

But I don't know how to use these values to merge into one image. There doesn't seem to be any documentation on what these values even mean. I found some information on transformation matrices here: Working with matrices.

But so far nothing else has helped me... Any suggestions?

My Code:

func setup() {

    let floatingImage = UIImage(named:"DJI_0333")!
    let referenceImage = UIImage(named: "DJI_0327")!

    let request = VNHomographicImageRegistrationRequest(targetedCGImage: floatingImage.cgImage!, options: [:])

    let handler = VNSequenceRequestHandler()
    try! handler.perform([request], on: referenceImage.cgImage!)

    if let results = request.results as? [VNImageHomographicAlignmentObservation] {
        print("Perspective warp found: \(results.count)")
        results.forEach { observation in
        // A matrix with 3 rows and 3 columns.                         
        let matrix = observation.warpTransform
        print(matrix) }
    }
}

标签： swift matrix apple-vision

1条回答

Deceive 欺骗

2楼-- · 2019-04-13 03:44

This homography matrix H describes how to project one of your images onto the image plane of the other image. To transform each pixel to its projected location, you can to compute its projected location x' = H * x using homogeneous coordinates (basically take your 2D image coordinate, add a 1.0 as third component, apply the matrix H, and go back to 2D by dividing through the 3rd component of the result).

The most efficient way to do this for every pixel, is to write this matrix multiplication in homogeneous space using CoreImage. CoreImage offers multiple shader kernel types: CIColorKernel, CIWarpKernel and CIKernel. For this task, we only want to transform the location of each pixel, so a CIWarpKernel is what you need. Using the Core Image Shading Language, that would look as follows:

import CoreImage
let warpKernel = CIWarpKernel(source:
    """
    kernel vec2 warp(mat3 homography)
    {
        vec3 homogen_in = vec3(destCoord().x, destCoord().y, 1.0); // create homogeneous coord
        vec3 homogen_out = homography * homogen_in; // transform by homography
        return homogen_out.xy / homogen_out.z; // back to normal 2D coordinate
    }
    """
)

Note that the shader wants a mat3 called homography, which is the shading language equivalent of the simd_float3x3 matrix H. When calling the shader, the matrix is expected to be stored in a CIVector, to transform it use:

let (col0, col1, col2) = yourHomography.columns
let homographyCIVector = CIVector(values:[CGFloat(col0.x), CGFloat(col0.y), CGFloat(col0.z),
                                             CGFloat(col1.x), CGFloat(col1.y), CGFloat(col1.z),
                                             CGFloat(col2.x), CGFloat(col2.y), CGFloat(col2.z)], count: 9)

When you apply the CIWarpKernel to an image, you have to tell CoreImage how big the output should be. To merge the warped and reference image, the output should be big enough to cover the whole projected and original image. We can compute the size of the projected image by applying the homography to each corner of the image rect (this time in Swift, CoreImage calls this rect the extent):

/**
 * Convert a 2D point to a homogeneous coordinate, transform by the provided homography,
 * and convert back to a non-homogeneous 2D point.
 */
func transform(_ point:CGPoint, by homography:matrix_float3x3) -> CGPoint
{
  let inputPoint = float3(Float(point.x), Float(point.y), 1.0)
  var outputPoint = homography * inputPoint
  outputPoint /= outputPoint.z
  return CGPoint(x:CGFloat(outputPoint.x), y:CGFloat(outputPoint.y))
}

func computeExtentAfterTransforming(_ extent:CGRect, with homography:matrix_float3x3) -> CGRect
{
  let points = [transform(extent.origin, by: homography),
                transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y), by: homography),
                transform(CGPoint(x: extent.origin.x + extent.width, y:extent.origin.y + extent.height), by: homography),
                transform(CGPoint(x: extent.origin.x, y:extent.origin.y + extent.height), by: homography)]

  var (xmin, xmax, ymin, ymax) = (points[0].x, points[0].x, points[0].y, points[0].y)
  points.forEach { p in
    xmin = min(xmin, p.x)
    xmax = max(xmax, p.x)
    ymin = min(ymin, p.y)
    ymax = max(ymax, p.y)
  }
  let result = CGRect(x: xmin, y:ymin, width: xmax-xmin, height: ymax-ymin)
  return result
}

let warpedExtent = computeExtentAfterTransforming(ciFloatingImage.extent, with: homography.inverse)
let outputExtent = warpedExtent.union(ciFloatingImage.extent)

Now you can create a warped version of your floating image:

let ciFloatingImage = CIImage(image: floatingImage)
let ciWarpedImage = warpKernel.apply(extent: outputExtent, roiCallback:
    {
        (index, rect) in
        return computeExtentAfterTransforming(rect, with: homography.inverse)
    },
    image: inputImage,
    arguments: [homographyCIVector])!

The roiCallback is there to tell CoreImage which part of the input image is needed to compute a certain part of the output. CoreImage uses this to apply the shader on parts of the image block by block, such that it can process huge images. (See Creating Custom Filters in Apple's docs). A quick hack would be to always return CGRect.infinite here, but then CoreImage can't do any block-wise magic.

And lastly, create a composite image of the reference image and the warped image:

let ciReferenceImage = CIImage(image: referenceImage)
let ciResultImage = ciWarpedImage.composited(over: ciReferenceImage)
let resultImage = UIImage(ciImage: ciResultImage)

0人赞添加讨论(0) 举报

Merge images using “VNImageHomographicAlignmentObs

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间