The documentation of scipy.io.wavfile.read
says that it returns sample rate and data. But what does data actually mean here in case of .wav
files?
Can anyone let me know in layman terms how that data is prepared?
PS. I read somewhere that it means amplitude? Is what I read correct? If yes, how is that amplitude calculated and returned by scipy.io.wavfile.read
?
scipy.io.wavfile.read
is a convenience wrapper to decompose the .wav
file into a header and the data contained in the file.
From the source code
Returns
-------
rate : int
Sample rate of wav file.
data : numpy array
Data read from wav file. Data-type is determined from the file;
see Notes.
Simplified code from the source:
fid = open(filename, 'rb')
try:
file_size, is_big_endian = _read_riff_chunk(fid) # find out how to read the file
channels = 1 # assume 1 channel and 8 bit depth if there is no format chunk
bit_depth = 8
while fid.tell() < file_size: #read the file a couple of bytes at a time
# read the next chunk
chunk_id = fid.read(4)
if chunk_id == b'fmt ': # retrieve formatting information
fmt_chunk = _read_fmt_chunk(fid, is_big_endian)
format_tag, channels, fs = fmt_chunk[1:4]
bit_depth = fmt_chunk[6]
if bit_depth not in (8, 16, 32, 64, 96, 128):
raise ValueError("Unsupported bit depth: the wav file "
"has {}-bit data.".format(bit_depth))
elif chunk_id == b'data':
data = _read_data_chunk(fid, format_tag, channels, bit_depth,is_big_endian, mmap)
finally:
if not hasattr(filename, 'read'):
fid.close()
else:
fid.seek(0)
return fs, data
The data itself is usually PCM represented sound pressure levels in successive frames for the different channels. The sampling rate returned by scipy.io.wavfile.read
is necessary to determine how many frames represent a second.
A good explanation of the .wav
format is offered by this question.
scipy doesn't calculate much on its own.