I'm trying to use Xuggler (which I believe uses ffmpeg
under the hood) to do the following:
- Accept a raw MPJPEG video bitstream (from a small TTL serial camera) and encode/transcode it to h.264; and
- Accept a raw audio bitsream (from a microphone) and encode it to AAC; then
- Mux the two (audio and video) bitsreams together into a MPEG-TS container
I've watched/read some of their excellent tutorials, and so far here's what I've got:
// I'll worry about implementing this functionality later, but
// involves querying native device drivers.
byte[] nextMjpeg = getNextMjpegFromSerialPort();
// I'll also worry about implementing this functionality as well;
// I'm simply providing these for thoroughness.
BufferedImage mjpeg = MjpegFactory.newMjpeg(nextMjpeg);
// Specify a h.264 video stream (how?)
String h264Stream = "???";
IMediaWriter writer = ToolFactory.makeWriter(h264Stream);
writer.addVideoStream(0, 0, ICodec.ID.CODEC_ID_H264);
writer.encodeVideo(0, mjpeg);
For one, I think I'm close here, but it's still not correct; and I've only gotten this far by reading the video code examples (not the audio - I can't find any good audio examples).
Literally, I'll be getting byte-level access to the raw video and audio feeds coming into my Xuggler implementation. But for the life of me I can't figure out how to get them into an h.264/AAC/MPEG-TS format. Thanks in advance for any help here.
Looking at Xuggler this sample code, the following should work to encode video as H.264 and mux it into a MPEG2TS container:
IMediaWriter writer = ToolFactory.makeWriter("output.ts");
writer.addVideoStream(0, 0, ICodec.ID.CODEC_ID_H264, width, height);
for (...)
{
BufferedImage mjpeg = ...;
writer.encodeVideo(0, mjpeg);
}
The container type is guessed from the file extension, the codec is specified explicitly.
To mux audio and video, you would do something like this:
writer.addVideoStream(videoStreamIndex, 0, videoCodec, width, height);
writer.addAudioStream(audioStreamIndex, 0, audioCodec, channelCount, sampleRate);
while (... have more data ...)
{
BufferedImage videoFrame = ...;
long videoFrameTime = ...; // this is the time to display this frame
writer.encodeVideo(videoStreamIndex, videoFrame, videoFrameTime, DEFAULT_TIME_UNIT);
short[] audioSamples = ...; // the size of this array should be number of samples * channelCount
long audioSamplesTime = ...; // this is the time to play back this bit of audio
writer.encodeAudio(audioStreamIndex, audioSamples, audioSamplesTime, DEFAULT_TIME_UNIT);
}
In this case I believe your code is responsible for interleaving the audio and video: you want to call either encodeAudio() or encodeVideo() on each pass through the loop, based on which data available (a chunk of audio samples or a video frame) has an earlier timestamp.
There is another, lower-level API you may end up using, based on IStreamCoder, which gives more control over various parameters. I don't think you will need to use that.
To answer the specific questions you asked:
(1) "Encode a BufferedImage (M/JPEG) into a h.264 stream" - you already figured that out, writer.addVideoStream(..., ICodec.ID.CODEC_ID_H264)
makes sure you get the H.264 codec. To get a transport stream (MPEG2 TS) container, simply call makeWriter()
with a filename with a .ts extension.
(2) "Figure out what the "BufferedImage-equivalent" for a raw audio feed is" - that is either a short[] or an IAudioSamples object (both seem to work, but IAudioSamples has to be constructed from an IBuffer which is much less straightforward).
(3) "Encode this audio class into an AAC audio stream" - call writer.addAudioStream(..., ICodec.ID.CODEC_ID_AAC, channelCount, sampleRate)
(4) "multiplex both stream into the same MPEG-TS container" - call makeWriter()
with a .ts filename, which sets the container type. For correct audio/video sync you probably need to call encodeVideo()/encodeAudio() in the correct order.
P.S. Always pass the earliest audio/video available first. For example, if you have audio chunks which are 440 samples long (at 44000 Hz sample rate, 440 / 44000 = 0.01 seconds), and video at exactly 25fps (1 / 25 = 0.04 seconds), you would give them to the writer in this order:
video0 @ 0.00 sec
audio0 @ 0.00 sec
audio1 @ 0.01 sec
audio2 @ 0.02 sec
audio3 @ 0.03 sec
video1 @ 0.04 sec
audio4 @ 0.04 sec
audio5 @ 0.05 sec
... and so forth
Most playback devices are probably ok with the stream as long as the consecutive audio/video timestamps are relatively close, but this is what you'd do for a perfect mux.
P.S. There are a few docs you may want to refer to: Xuggler class diagram, ToolFactory, IMediaWriter, ICodec.
I think you should look at gstreamer: http://gstreamer.freedesktop.org/ You would have to look for plugin that can capture the camera input and then pipe it to libx264 and aac plugins and them pass them through a mpegts muxer.
A pipeline in gstreamer would look like:
v4l2src queue-size=15 ! video/x-raw,framerate=25/1,width=384,height=576 ! \
avenc_mpeg4 name=venc \
alsasrc ! audio/x-raw,rate=48000,channels=1 ! audioconvert ! lamemp3enc name=aenc \
avimux name=mux ! filesink location=rec.avi venc. ! mux. aenc. ! mux.
In this pipeline mpeg4 and mp3 encoders are being used and the stream is muxed to avi. You should be able to find plugins for libx264 and aac. Let me know if you need further pointers.