How to handle the PTS correctly using Android Audi

I'm using AudioRecord to record the audio stream during a camera capturing process on Android device. Since I want to process the frame data and handle audio/video samples, I do not use MediaRecorder.

I run AudioRecord in another thread with the calling of read() to gather the raw audio data. Once I get a data stream, I feed them into an MediaCodec configured as an AAC audio encoder.

Here are some of my codes about the audio recorder / encoder:

m_encode_audio_mime = "audio/mp4a-latm";
m_audio_sample_rate = 44100;
m_audio_channels = AudioFormat.CHANNEL_IN_MONO;
m_audio_channel_count = (m_audio_channels == AudioFormat.CHANNEL_IN_MONO ? 1 : 2);

int audio_bit_rate = 64000;
int audio_data_format = AudioFormat.ENCODING_PCM_16BIT;

m_audio_buffer_size = AudioRecord.getMinBufferSize(m_audio_sample_rate, m_audio_channels, audio_data_format) * 2;
m_audio_recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, m_audio_sample_rate,
                                   m_audio_channels, audio_data_format, m_audio_buffer_size);

m_audio_encoder = MediaCodec.createEncoderByType(m_encode_audio_mime);
MediaFormat audio_format = new MediaFormat();
audio_format.setString(MediaFormat.KEY_MIME, m_encode_audio_mime);
audio_format.setInteger(MediaFormat.KEY_BIT_RATE, audio_bit_rate);
audio_format.setInteger(MediaFormat.KEY_CHANNEL_COUNT, m_audio_channel_count);
audio_format.setInteger(MediaFormat.KEY_SAMPLE_RATE, m_audio_sample_rate);
audio_format.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
audio_format.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, m_audio_buffer_size);
m_audio_encoder.configure(audio_format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);

I found that the first time of AudioRecord.read() takes longer time to return, while the successive read() have time intervals that are more close to the real time of audio data. For example, my audio format is 44100Hz 16Bit 1Channel, and the buffer size of AudioRecord is 16384, so a full buffer means 185.76 ms. When I record the system time for each call of read() and subtracting them from a base time, I get the following sequence:

time before each read(): 0ms, 345ms, 543ms, 692ms, 891ms, 1093ms, 1244ms, ...

I feed these raw data to the audio encoder with the above time values as PTS, and the encoder outputs encoded audio samples with the following PTS:

encoder output PTS: 0ms, 185ms, 371ms, 557ms, 743ms, 928ms, ...

It looks like that the encoder treats each part of data as having the same time period. I believe that the encoder works correctly since I give it raw data with the same size (16384) every time. However, if I use the encoder output PTS as the input of muxer, I'll get a video with audio content being faster then video content.

I want to ask that:

Is it expected that the first time of AudioRecord.read() blocks longer? I'm sure that the function call takes more than 300ms while it only records 16384 bytes as 186ms. Is this also an issue that depends on device / Android version?
What should I do to achieve audio/video synchronization? I have a workaround to measure the delay time of the first call of read(), then shift the PTS of audio samples by the delay. Is there another better way to handle this?

Convert the mono input to stereo. I was pulling my hair out for some time before I realised the AAC encoder exposed by MediaCoder only works with stereo input.