h264 inside AVI, MP4 and “Raw” h264 streams. Diffe

2020-03-02 06:53发布

TL;DR: I want to read raw h264 streams from AVI/MP4 files, even broken/incomplete.

Almost every document about h264 tells me that it consists of NAL packets. Okay. Almost everywhere told to me that the packet should start with a signature like 00 00 01 or 00 00 00 01. For example, https://stackoverflow.com/a/18638298/8167678, https://stackoverflow.com/a/17625537/8167678

The format of H.264 is that it’s made up of NAL Units, each starting with a start prefix of three bytes with the values 0x00, 0x00, 0x01 and each unit has a different type depending on the value of the 4th byte right after these 3 starting bytes. One NAL Unit IS NOT one frame in the video, each frame is made up of a number of NAL Units.

Okay.

I downloaded random_youtube_video.mp4 and strip out one frame from it:

ffmpeg -ss 10 -i random_youtube_video.mp4 -frames 1 -c copy pic.avi

And got: hexdump of AVI Red part - this is part of AVI container, other - actual data. As you can see, here I have 00 00 24 A9 instead of 00 00 00 01

This AVI file plays perfectly

I do same for mp4 container: hexdump of mp4

As you can see, here exact same bytes. This MP4 file plays perfectly

I try to strip out raw data: ffmpeg -i pic.avi -c copy pic.h264 Raw data

This file can't play in VLC or even ffmpeg, which produced this file, can't parse it: ffmpeg error

I downloaded mp4 stream analyzer and got: Analysis

MP4Box tells me:

 Cannot find H264 start code
 Error importing pic.h264: BitStream Not Compliant

It very hard to learn internals of h264, when nothing works.

So, I have questions:

  1. What actual data inside mp4?
  2. What I must read to decode that data (I mean different annex-es)
  3. How to read stream and get decoded image (even with ffmpeg) from this "broken" raw stream?

UPDATE:

It seems bug in ffmpeg:

When I do double conversion:

         ffmpeg -ss 10 -i random_youtube_video.mp4 -frames 1 -c copy pic.mp4
         ffmpeg pic.mp4 -c copy pic.h264

enter image description here

But when I convert file directly:

ffmpeg -ss 10 -i random_youtube_video.mp4 -frames 1 -c copy pic.h264 with NALs

I have NALs signatures and one extra NAL unit. Other bytes are same (selected).

This is bug?

UPDATE

Not, this is not bug, U must use option -bsf h264_mp4toannexb to save stream as "Annex B" format (with prefixes)

1条回答
啃猪蹄的小仙女
2楼-- · 2020-03-02 07:41

"I want to read raw h264 streams from AVI files, even broken/incomplete."

"Almost everywhere told to me that the packet should start with a signature like :
00 00 01 or 00 00 00 01"

"...As you can see, here I have 00 00 24 A9 instead of 00 00 00 01"

Your H264 is in AVCC format which means it uses data sizes (instead of data start codes). It is only Annex-B that will have your mentioned signature as start code.

You seek frames, not by looking for start codes, but instead you just do skipping by frame sizes to reach the final correct offset of a (requested) frame...

AVI processing :

  • Read size (four) bytes (32-bit integer, Little Endian).

  • Extract the next following bytes up to size amount.

  • This is your H.264 frame (in AVCC format), decode the bytes to view image.

  • To convert into Annex-B, try replacing first 4 bytes of H.264 frame bytes with 00 00 00 01.

Consider your shown AVI bytes (see first picture) :

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00     ................
00 00 00 00 4C 49 53 54 BA 24 00 00 6D 6F 76 69     ....LISTº$..movi
30 30 64 63 AD 24 00 00 00 00 24 A9 65 88 84 27     00dc.$....$©eˆ„'
C7 11 FE B3 C7 83 08 00 08 2A 7B 6E 59 B5 71 E1     Ç.þ³Çƒ...*{nYµqá
E3 9C 0E 73 E7 10 50 00 18 E9 25 F7 AA 7D 9C 30     ãœ.sç.P..é%÷ª}œ0
E6 2F 0F 20 00 3A 64 AA CA 5E 4F CA FF AE 20 04     æ/. .:dªÊ^OÊÿ® .
07 81 40 00 48 00 0A 28 71 21 84 48 06 18 90 0C     ..@.H..(q!„H....
31 14 57 9E 7A CD 63 A0 E0 9B 96 69 C5 18 AE F2     1.WžzÍc à›–iÅ.®ò
E6 07 02 29 01 20 10 70 A1 0F 8C BC 73 F0 78 FA     æ..). .p¡.Œ¼sðxú
9E 1D E1 C2 BF 8C 62 CE CE AC 14 5A A4 E1 45 44     ž.á¿ŒbÎά.Z¤áED
38 38 85 DB 12 57 3E F6 E0 FB AE 03 04 21 62 8D     88…Û.W>öàû®..!b.
F6 F1 1E 37 1C A2 FF 75 1C F1 02 66 0C 92 07 06     öñ.7.¢ÿu.ñ.f.’..
15 7C 90 15 6F 7D FC BD 13 1E 2B 0C 14 3C 0C 00     .|..o}ü½..+..<..
B0 EA 6F 53 B4 98 D7 80 7A 68 3E 34 69 20 D2 FA     °êoS´˜×€zh>4i Òú
F0 91 FC 75 C6 00 01 18 C0 00 3B 9A C5 E2 7D BF     ð‘üuÆ...À.;šÅâ}¿

Some explanation :

  • Ignore leading multiple 00 bytes.

  • 4C 49 53 54 D6 3C 00 00 6D 6F 76 69 including 30 30 64 63 = AVI "List" header.

  • AD 24 00 00 == decimal 9389 is AVI's own size of H264 item (must read in Little Endian).

Notice that the AVI bytes include...
- a note of item's total size (AD 24 00 00... or reverse for Little Endian : 00 00 24 AD)
- followed by item data (00 00 24 A9 65 88 84 27 ... etc ... C5 E2 7D BF).

This size includes both the 4 bytes of the AVI's"size" entry + expected bytes length of the item's own bytes. Can be written simply as:

AVI_Item_Size = ( 4 + item_H264_Frame.length );

H.264 video frame bytes in AVI :

Next follows the item data, which is the H.264 video frame. By sheer coincidence of formats/bytes layout, it too holds a 4-byte entry for data's size (since your H264 is in AVCC format, if it was Annex-B then you would be seeing start code bytes here instead of size bytes).

Unlike AVI bytes, these H264 size bytes are written in Big Endian format.

  • 00 00 24 A9 = size of bytes for this video frame (instead of start code : 00 00 00 01).

  • 65 88 84 27 C7 11 FE B3 C7 = H.264 keyframe (always begins X5, where the X value is based on other settings).

  • Remember after four size bytes (or even start codes) if followed by...

    • byte X5 = keyframe (IDR), example byte 65.
    • byte X1 = P or B frame, example byte 41.
    • byte X6 = SEI (Supplemental Enhancement Information).
    • byte X7 = SPS (Sequence Parameter Set).
    • byte X8 = PPS (Picture Parameter Set).
    • bytes 00 00 00 X9 = Access unit delimiter.

You can find the H.264 if you search for exact same bytes within AVI file. See third picture, these are your H.264 bytes (they are cut & pasted into the AVI container).

Sometimes a frame is sliced into different NAL units. So if you extract a key frame and it only shows 1/2 or 1/3 instead of full image, just grab next one or two NAL and re-try the decode.

查看更多
登录 后发表回答