Environment:
I have an IP Camera, which is capable of streaming it's data over RTP in a H.264 encoded format. This raw stream is recorded from the ethernet. With that data I have to work.
Goal:
In the end I want to have a *.mp4 file, which I can play with common Media Players (like VLC or Windows MP).
What have I done so far:
I take that raw stream data I have and parse it. Since the data has been transmitted via RTP I need to take care of the NAL Bytes, SPS and PPS.
1. Write a raw file
First I determine the type of each frame received over Ethernet. To do so, I parse the first two bytes of every RTP Payload, so I can get the 8 NAL Unit Bit, the Fragment Type Bits and the Start, Reserved and End Bit. In the payload, they're arranged like this:
Byte 1: [ 3 NAL Unit Bits | 5 Fragment Type Bits]
Byte 2: [Start Bit | Reserved Bit | End Bit | 5 NAL Unit Bits]
From this I can determine:
- Start and End of a Video Frame -> Start Bit and End Bit
- Type of the Payload -> 5 Fragment Type Bits
- NAL Unit Byte
The Fragment types which are necessary in my case are:
Fragment Type 7 = SPS
Fragment Type 8 = PPS
Fragment Type 28 = Video Fragment
The NAL Byte is created by putting the NAL Unit Bits from Byte 1 and 2 together.
Now depending on the fragmentation type I do the following:
SPS/PPS:
- Write the NAL Prefix (
0x00 0x00 0x01
) and then the SPS or PPS data
Fragmentation with Start Bit
- Write NAL Prefix
- Write NAL Unit Byte
- Write remaining raw data
Fragmentation without Start Bit
- Write raw data
This means my raw file looks something like this:
[NAL Prefix][SPS][NAL Prefix][PPS][NAL Prefix][NAL Unit Byte][Raw Video Data][Raw Video Data]....[NAL Prefix][NAL Unit Byte][Raw Video Data]...
For every PPS and SPS I find in my stream data, I just write a NAL Prefix ( 0x00 0x00 0x01 ) and then the SPS/PPS itself.
Now I can't play this data with some media player, which leads me to :
2. Convert the file
Since I wanted to avoid working much with codecs I just went to use an existing application -> FFmpeg. This I am calling with those parameters:
ffmpeg.exe -f h264 -i <RawInputFile> -vcodec copy -r 25 <OutPutFilename>.mp4
-f h264
: This should tell ffmpeg I have a h264 coded stream
-vcodec copy
: Quote from the manpage:
Force video codec to codec. Use the "copy" special value to tell that the raw codec data must be copied as is.
-r 25
: Sets the framerate to 25 FPS.
When I call ffmpeg with those parameters I get an .mp4 File, which I can play with VLC and Windows MP, so it actually works. But the file now looks a bit different from my raw file.
This leads me to my question:
What did I actually do?
My problem is not that it is not working. I just want/need to know what I have actually done with calling ffmpeg. I had a raw H264 file which I could not play. After using FFmpeg I can play it.
There are the following differences between the original raw file (which I have written) and the one written by FFmpeg:
- Header: The FFmpeg File has like about 0x30 Bytes of Header
- Footer: The FFmpeg File also has a footer
- Changed Prefix and 2 new Bytes:
While a new Video Frame from the Raw File started like
[NAL Prefix][NAL Unit Byte][Raw Video Data]
in the new file it looks like this:
[0x00 0x00][2 "Random" Bytes][NAL Unit Byte][Raw Video Data].....[0x00 0x00[2 other "Random" Bytes][NAL Unit Byte][Raw Video Data]...
I understand that the Video Stream needs a container format (correct me if I am wrong but I assume that the new header and footer are responsible for that). But why does it change actually some Bytes in the raw data? It can't be some decoding since the stream itself should get decoded by the player and not ffmpeg.
As you can see I don't need a new solution for my problem as far more an explanation (so I can explain it by myself). What does ffmpeg actually do? And why does it change some bytes within the video data?
Besides adding the MP4 container, ffmpeg converted your H.264 Annex B byte stream (with NAL prefixes) to a length prefixed format.
Your [0x00 0x00][2 "Random" Bytes] is a 32 bit integer, giving the length of the following NAL unit in bytes.
You can read more about your changes in open h264 specs. Chapter Annex B.
Looks like the stream got packetized. Many container formats split up the bitstream into packets and add a bit of info such as time stamps, length of the packet, etc. This gives hooks to the decoder to skip through the file without decoding everything, resynching when a packet is lost, synching audio/video, combining multiple streams, etc.
Look at the MP4 file format info for more information:
http://en.wikipedia.org/wiki/MPEG-4_Part_14