I am working on an online TV service. One of the goals is for the video to be played without any additional browser plug-ins (except for Flash).
I decided to use MP4, because it is supported by the majority of HTML5 browsers and by Flash (for fallback). The videos are transcoded from ASF on a server by FFMpeg.
However, I found that MP4 cannot be live-streamed because it has a moov atom for metadata that has to specify the length. FFMpeg cannot directly stream mp4 to stdout, because it puts the moov at the end of the file. ( Live transcoding and streaming of MP4 works in Android but fails in Flash player with NetStream.Play.FileStructureInvalid error )
Of course, MPEG-TS exists, but it is not supported by HTML5 <video>
.
What I thought about is a method to transcode the stream in real-time to MP4, and on each new HTTP request for it, first send a moov that specifies a very long number for the video's length, and then start sending the rest of the MP4 file.
Is it possible to use MP4 for streaming that way?
After some research and av501's answer, I understand that the sizes of the frames must be known so that it can work.
Can the mp4 file be segmented into smaller parts so that it can be streamed?
Of course, switching to another container/format is an option, but the only format compatible with both Flash and HTML5 is mp4/h264, so if I have to support both, I'd have to transcode twice.
You may use fragmented MP4. A fragmented MP4 file is built a follows:
The moov box then only contains basic information about the tracks (how many, their type , codec initialization and so on) but no information about the samples in the track. The information about sample locations and sample sizes is in the moof box, each moof box is followed by a mdat that contains the samples as described in the preceding moof box. Typically one would choose the length of a (moof, mdat)-pair to be around 2,4 or 8 seconds (there is no specification on that but these values seem to be reasonable for most usecases).
This is a way to construct a neverending MP4 stream.
No, it is not just the very long length.. you need to know exact size of every frame to create the header in a mp4. [which is why it gets created at the end by the various encoders].
Just looking at 2nd para of your question ("The videos are transcoded from ASF on a server by ffmpeg."), you mentioned you are using ffmpeg to transcode videos on server.
Use qt-faststart or MP4Box to place MOOV atom in the beginning of the file. (also you make sure that using H264 Video & AAC Audio codec for universal support)
Hope this helped you.
Here's my thoughts guys some of it might be right on others way way off. I plead ignorance because no one have really documented this process fully, its all an educated guess.
AvAssetWriter only encodes to a file, there seems to be no way to get encoded video to memory. Reading the file while it is being written to from a background thread to say a socket results in an elementary stream, this is essentially an m4v, which its a container with h264/acc mdata, but no moov atoms. (in other words no header) No apple supplied player can play this stream, but a modified player based on ffplay should be able to decode and play the stream. This should work, because ffplay use libavformat which can decode elementary streams, one caveat since there is no file length info, some things have to be determined by the play, the DTS and PTS and also the player can't seek within the file.
Alternatively an the raw naul's from the m4v stream can be used to construct an rtmp stream.
If you want to discuss further you can contact me directly.
How you get at the data.
Since your going to have to rebuild the file on the receiving side anyway, I guess you could just kind of segment it, Steve Mcfarin wrote a little appleSegmentedEcorder you can find on his github page, this solves some of the issues for moov atoms since you have all the file info.