Fastest way to extract a specific frame from a vid

I have a web page, which (among other things) needs to extract a specific frame from a user-uploaded video. The user seeks to a particular part of a .mp4 in the player, then clicks a button, and an ajax call gets fired off to a php script which takes the .mp4, and the exact time from the video, and uses that to extract a "thumbnail" frame.

My current solution is using the php exec command:

exec("ffmpeg -i $videoPath -ss $timeOffset -vframes 1 $jpgOutputPath");

...which works just great, except it's as slow as molasses. My guess is that ffmpeg is a little too much for the job, and I might be able to do better by utilizing the underlying libraries or something... however I have zero idea how to do that.

Ideally I don't want to have to install anything that requires a real "installation process"... i.e., dropping an executable into the folder with my web app is fine, but I'd rather not have to actually run an installer. Also, the solution should be able to run on mac, linux and windows (though linux is the top priority).

What can I do to speed this process up?

Thanks.

Of course you could code up some C/C++ and link to -lav*, basically creating a simplified version of ffmpeg just for extracting frames, and maybe even do it as a php extension (also I wouldn't run it as the same user, let alone in the same process). But the result is very unlikely to be faster, because you would only avoid some forking and setup overhead, but your likely problem is actually the decoding, which would still be the same.

Instead, you should first look into using ffmpeg in fast seeking mode (or fast/accurate hybrid mode). Their wiki states about fast seeking:

The -ss parameter needs to be specified before -i:

ffmpeg -ss 00:03:00 -i Underworld.Awakening.avi -frames:v 1 out1.jpg

This example will produce one image frame (out1.jpg) somewhere around the third minute from the beginning of the movie. The input will be parsed using keyframes, which is very fast. The drawback is that it will also finish the seeking at some keyframe, not necessarily located at specified time (00:03:00), so the seeking will not be as accurate as expected.

Fast seeking is less accurate, but a damn lot faster, as ffmpeg will not actually need to decode (most of) the movie during the seek, while fast/accurate hybrid mode is good compromise. Read the wiki page for all available options.

Edit 14/06/10:

As of FFmpeg 2.1, when transcoding with ffmpeg (i.e. not stream copying), -ss is now accurate even when used as an input option. Previous behavior can be restored with the -noaccurate_seek option. (source)

So with 2.1+, "hybrid" seeking shouldn't be required anymore for accurate results when it comes to re-encodes (and saving to .jpeg is a re-encode). It is enough to do the usual fast seeking (-ss ... -i ...) instead of slow seeking (-i ... -ss ...).