Hey there, I am trying to access raw data from iphone camera using AVCaptureSession. I follow the guide provided by Apple (link here).
The raw data from the samplebuffer is in YUV format ( Am I correct here about the raw video frame format?? ), how to directly obtain the data for Y component out of the raw data stored in the samplebuffer.
In addition to Brad's answer, and your own code, you want to consider the following:
Since your image has two separate planes, the function CVPixelBufferGetBaseAddress will not return the base address of the plane but rather the base address of an additional data structure. It's probably due to the current implementation that you get an address close enough to the first plane so that you can see the image. But it's the reason it's shifted and has garbage at the top left. The correct way to receive the first plane is:
A row in the image might be longer than the width of the image (due to rounding). That's why there are separate functions for getting the width and the number of bytes per row. You don't have this problem at the moment. But that might change with the next version of iOS. So your code should be:
Please also note that your code will miserably fail on an iPhone 3G.
When setting up the AVCaptureVideoDataOutput that returns the raw camera frames, you can set the format of the frames using code like the following:
In this case a BGRA pixel format is specified (I used this for matching a color format for an OpenGL ES texture). Each pixel in that format has one byte for blue, green, red, and alpha, in that order. Going with this makes it easy to pull out color components, but you do sacrifice a little performance by needing to make the conversion from the camera-native YUV colorspace.
Other supported colorspaces are
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
andkCVPixelFormatType_420YpCbCr8BiPlanarFullRange
on newer devices andkCVPixelFormatType_422YpCbCr8
on the iPhone 3G. TheVideoRange
orFullRange
suffix simply indicates whether the bytes are returned between 16 - 235 for Y and 16 - 240 for UV or full 0 - 255 for each component.I believe the default colorspace used by an AVCaptureVideoDataOutput instance is the YUV 4:2:0 planar colorspace (except on the iPhone 3G, where it's YUV 4:2:2 interleaved). This means that there are two planes of image data contained within the video frame, with the Y plane coming first. For every pixel in your resulting image, there is one byte for the Y value at that pixel.
You would get at this raw Y data by implementing something like this in your delegate callback:
You could then figure out the location in the frame data for each X, Y coordinate on the image and pull the byte out that corresponds to the Y component at that coordinate.
Apple's FindMyiCone sample from WWDC 2010 (accessible along with the videos) shows how to process raw BGRA data from each frame. I also created a sample application, which you can download the code for here, that performs color-based object tracking using the live video from the iPhone's camera. Both show how to process raw pixel data, but neither of these work in the YUV colorspace.
If you only need the luminance channel, I recommend against using BGRA format, as it comes with a conversion overhead. Apple suggest using BGRA if you're doing rendering stuff, but you don't need it for extracting the luminance information. As Brad already mentioned, the most efficient format is the camera-native YUV format.
However, extracting the right bytes from the sample buffer is a bit tricky, especially regarding the iPhone 3G with it's interleaved YUV 422 format. So here is my code, which works fine with the iPhone 3G, 3GS, iPod Touch 4 and iPhone 4S.
This is simply the culmination of everyone else's hard work, above and on other threads, converted to swift 3 for anyone that finds it useful.