TL;DR: I want to read raw h264 streams from AVI/MP4 files, even broken/incomplete.
Almost every document about h264 tells me that it consists of NAL packets. Okay. Almost everywhere told to me that the packet should start with a signature like 00 00 01
or 00 00 00 01
. For example, https://stackoverflow.com/a/18638298/8167678, https://stackoverflow.com/a/17625537/8167678
The format of H.264 is that it’s made up of NAL Units, each starting with a start prefix of three bytes with the values 0x00, 0x00, 0x01 and each unit has a different type depending on the value of the 4th byte right after these 3 starting bytes. One NAL Unit IS NOT one frame in the video, each frame is made up of a number of NAL Units.
Okay.
I downloaded random_youtube_video.mp4 and strip out one frame from it:
ffmpeg -ss 10 -i random_youtube_video.mp4 -frames 1 -c copy pic.avi
And got:
Red part - this is part of AVI container, other - actual data.
As you can see, here I have 00 00 24 A9
instead of 00 00 00 01
This AVI file plays perfectly
As you can see, here exact same bytes. This MP4 file plays perfectly
I try to strip out raw data:
ffmpeg -i pic.avi -c copy pic.h264
This file can't play in VLC or even ffmpeg, which produced this file, can't parse it:
I downloaded mp4 stream analyzer and got:
MP4Box
tells me:
Cannot find H264 start code
Error importing pic.h264: BitStream Not Compliant
It very hard to learn internals of h264, when nothing works.
So, I have questions:
- What actual data inside mp4?
- What I must read to decode that data (I mean different annex-es)
- How to read stream and get decoded image (even with ffmpeg) from this "broken" raw stream?
UPDATE:
It seems bug in ffmpeg:
When I do double conversion:
ffmpeg -ss 10 -i random_youtube_video.mp4 -frames 1 -c copy pic.mp4
ffmpeg pic.mp4 -c copy pic.h264
But when I convert file directly:
ffmpeg -ss 10 -i random_youtube_video.mp4 -frames 1 -c copy pic.h264
I have NALs signatures and one extra NAL unit. Other bytes are same (selected).
This is bug?
UPDATE
Not, this is not bug, U must use option -bsf h264_mp4toannexb to save stream as "Annex B" format (with prefixes)
Your H264 is in AVCC format which means it uses data sizes (instead of data start codes). It is only Annex-B that will have your mentioned signature as start code.
You seek frames, not by looking for start codes, but instead you just do skipping by frame sizes to reach the final correct offset of a (requested) frame...
AVI processing :
Read size (four) bytes (32-bit integer, Little Endian).
Extract the next following bytes up to size amount.
This is your H.264 frame (in AVCC format), decode the bytes to view image.
To convert into Annex-B, try replacing first 4 bytes of H.264 frame bytes with
00 00 00 01
.Consider your shown AVI bytes (see first picture) :
Some explanation :
Ignore leading multiple
00
bytes.4C 49 53 54 D6 3C 00 00 6D 6F 76 69
including30 30 64 63
= AVI "List" header.AD 24 00 00
== decimal9389
is AVI's own size of H264 item (must read in Little Endian).Notice that the AVI bytes include...
- a note of item's total size (
AD 24 00 00
... or reverse for Little Endian :00 00 24 AD
)- followed by item data (
00 00 24 A9 65 88 84 27 ... etc ... C5 E2 7D BF
).This size includes both the 4 bytes of the AVI's"size" entry + expected bytes length of the item's own bytes. Can be written simply as:
H.264 video frame bytes in AVI :
Next follows the item data, which is the H.264 video frame. By sheer coincidence of formats/bytes layout, it too holds a 4-byte entry for data's size (since your H264 is in AVCC format, if it was Annex-B then you would be seeing start code bytes here instead of size bytes).
Unlike AVI bytes, these H264 size bytes are written in Big Endian format.
00 00 24 A9
= size of bytes for this video frame (instead of start code :00 00 00 01
).65 88 84 27 C7 11 FE B3 C7
= H.264 keyframe (always begins X5, where theX
value is based on other settings).Remember after four size bytes (or even start codes) if followed by...
X5
= keyframe (IDR), example byte65
.X1
= P or B frame, example byte41
.X6
= SEI (Supplemental Enhancement Information).X7
= SPS (Sequence Parameter Set).X8
= PPS (Picture Parameter Set).00 00 00 X9
= Access unit delimiter.You can find the H.264 if you search for exact same bytes within AVI file. See third picture, these are your H.264 bytes (they are cut & pasted into the AVI container).
Sometimes a frame is sliced into different NAL units. So if you extract a key frame and it only shows 1/2 or 1/3 instead of full image, just grab next one or two NAL and re-try the decode.