I would like to perform face detection / tracking on a video file (e.g. an MP4 from the users gallery) using the Android Vision FaceDetector
API. I can see many examples on using the CameraSource class to perform face tracking on the stream coming directly from the camera (e.g. on the android-vision github), but nothing on video files.
I tried looking at the source code for CameraSource
through Android Studio, but it is obfuscated, and I couldn't see the original online. I image there are many commonalities between using the camera and using a file. Presumably I just play the video file on a Surface
, and then pass that to a pipeline.
Alternatively I can see that Frame.Builder
has functions setImageData
and setTimestampMillis
. If I was able to read in the video as ByteBuffer
, how would I pass that to the FaceDetector
API? I guess this question is similar, but no answers. Similarly, decode the video into Bitmap
frames and pass that to setBitmap
.
Ideally I don't want to render the video to the screen, and the processing should happen as fast as the FaceDetector
API is capable of.
Alternatively I can see that Frame.Builder has functions setImageData and setTimestampMillis. If I was able to read in the video as ByteBuffer, how would I pass that to the FaceDetector API?
Simply call SparseArray<Face> faces = detector.detect(frame);
where detector
has to be created like this:
FaceDetector detector = new FaceDetector.Builder(context)
.setProminentFaceOnly(true)
.build();
If processing time is not an issue, using MediaMetadataRetriever.getFrameAtTime
solves the question. As Anton suggested, you can also use FaceDetector.detect
:
Bitmap bitmap;
Frame frame;
SparseArray<Face> faces;
MediaMetadataRetriever mMMR = new MediaMetadataRetriever();
mMMR.setDataSource(videoPath);
String timeMs = mMMR.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION); // video time in ms
int totalVideoTime= 1000*Integer.valueOf(timeMs); // total video time, in uS
for (int time_us=1;time_us<totalVideoTime;time_us+=deltaT){
bitmap = mMMR.getFrameAtTime(time_us, MediaMetadataRetriever.OPTION_CLOSEST_SYNC); // extract a bitmap element from the closest key frame from the specified time_us
if (bitmap==null) break;
frame = new Frame.Builder().setBitmap(bitmap).build(); // generates a "Frame" object, which can be fed to a face detector
faces = detector.detect(frame); // detect the faces (detector is a FaceDetector)
// TODO ... do something with "faces"
}
where deltaT=1000000/fps
, and fps
is the desired number of frames per second. For example, if you want to extract 4 frames every second, deltaT=250000
(Note that faces
will be overwritten on every iteration, so you should do something (store/report results) inside the loop