How to count/detect frames (pictures) in raw H.264 bitstream? I know there are 5 VCL NALU types but I don't know how to rec(k)ognize sequence of them as access unit. I suppose detect a frame means detect an access unit as access unit is
A set of NAL units that are consecutive in decoding order and contain
exactly one primary coded picture. In addition to the primary coded
picture, an access unit may also contain one or more redundant coded
pictures, one auxiliary coded picture, or other NAL units not
containing slices or slice data partitions of a coded picture. The
decoding of an access unit always results in a decoded picture.
I want it to know what is the FPS of live stream out to server.
You are right on the interpretation, and if you want to parse the stream by yourself, take a look here
But to quickly extract stream info in a format easy to read and parse (with any text parser) you can use ffprobe
ffprobe -show_streams -count_frames -pretty filename
You will find in the output:
And for the fps, as I heard that ffprobe may report some error for the fps, try a simple ffmpeg -i
command.
ffmpeg -i filename 2>&1 | sed -n "s/.*, \(.*\) fps.*/\1/p"
From ITU-T H.264 (03/2009):
7.4.1.2.3 Order of NAL units and coded pictures and association to access units
This subclause specifies the order of NAL units and coded pictures and association to access unit for coded video sequences that conform to one or more of the profiles specified in Annex A that are decoded using the decoding process specified in clauses 2-9.
An access unit consists of one primary coded picture, zero or more corresponding redundant coded pictures, and zero or more non-VCL NAL units. The association of VCL NAL units to primary or redundant coded pictures is described in subclause 7.4.1.2.5.
The first access unit in the bitstream starts with the first NAL unit of the bitstream.
The first of any of the following NAL units after the last VCL NAL unit of a primary coded picture specifies the start of a new access unit:
- access unit delimiter NAL unit (when present),
- sequence parameter set NAL unit (when present),
- picture parameter set NAL unit (when present),
- SEI NAL unit (when present),
- NAL units with nal_unit_type in the range of 14 to 18, inclusive (when present),
- first VCL NAL unit of a primary coded picture (always present).
The constraints for the detection of the first VCL NAL unit of a primary coded picture are specified in subclause 7.4.1.2.4.
7.4.1.2.4 Detection of the first VCL NAL unit of a primary coded picture
This subclause specifies constraints on VCL NAL unit syntax that are sufficient to enable the detection of the first VCL NAL unit of each primary coded picture for coded video sequences that conform to one or more of the profiles specified in Annex A that are decoded using the decoding process specified in clauses 2-9.
Any coded slice NAL unit or coded slice data partition A NAL unit of the primary coded picture of the current access unit shall be different from any coded slice NAL unit or coded slice data partition A NAL unit of the primary coded picture of the previous access unit in one or more of the following ways:
- frame_num differs in value. The value of frame_num used to test this condition is the value of frame_num that appears in the syntax of the slice header, regardless of whether that value is inferred to have been equal to 0 for subsequent use in the decoding process due to the presence of memory_management_control_operation equal to 5. (NOTE 1 – A consequence of the above statement is that a primary coded picture having frame_num equal to 1 cannot contain a memory_management_control_operation equal to 5 unless some other condition listed below is fulfilled for the next primary coded picture that follows after it (if any).)
- pic_parameter_set_id differs in value.
- field_pic_flag differs in value.
- bottom_field_flag is present in both and differs in value.
- nal_ref_idc differs in value with one of the nal_ref_idc values being equal to 0.
- pic_order_cnt_type is equal to0 for both and either pic_order_cnt_lsb differs in value, or delta_pic_order_cnt_bottom differs in value.
- pic_order_cnt_type is equal to 1 for both and either delta_pic_order_cnt[ 0 ] differs in value, or delta_pic_order_cnt[ 1 ] differs in value.
- IdrPicFlag differs in value.
- IdrPicFlag is equal to 1 for both and idr_pic_id differs in value.
(NOTE 2 – Some of the VCL NAL units in redundant coded pictures or some non-VCL NAL units (e.g., an access unit delimiter NAL unit) may also be used for the detection of the boundary between access units, and may therefore aid in the detection of the start of a new primary coded picture.)
NAL units do not have a 1-1 relationship to frames necessarily. Frames can be split into multiple NAL units. If you want to parse the stream manually, you'll need to handle each type which is pretty well defined in the blog article below. If the stream has an SPS NAL packet it should contain frame rate, but thats not necessarily the actual frame rate, just what the container believes it has.
As you are asking as well about how to find the actual start of an AU, if its an "Annex B" bitstream each NALU will have a start code 0x000001 or 0x00000001. AVCC uses a small header to define the length of the NALU.
Check out the following great blog post for more details: szatmary.org
Hope that helps!