My problem
I'm trying to fit a (machine-learning) model that takes in an audiofile (.wav) and predicts the emotion from it (multi-label classification).
I'm trying to read the sample rate and signal from the file, but when calling read(filename)
from scipy.io.wavfile
, I'm getting ValueError: Incomplete wav chunk.
What I've tried
I've tried switching from
scipy.read()
tolibrosa.read()
.
They both output the signal and sample rate, but for some reasonlibrosa
takes exponentially longer time thanscipy
, and is impractical for my task.I've tried
sr, y = scipi.io.wavfile.read(open(filename, 'r'))
as suggested here, to no avail.I've tried looking into my files and checking what might cause it:
Out of all 2084 wav files, 1057 were good (=scipy managed to read them), and 1027 were bad (=raised the error).
I couldn't seem to find any thing pointing as to what makes a file pass or fail, but nonetheless it's a weird result, as all files are taken from the same dataset from the same origin.I've heard people saying I could just re-export the files as wav using some software, and it should work.
I didn't try this because a) I don't have any audio-processing software and it seems like an overkill, and b) I want to understand the actual problem rather than put a bandaid on it.
Minimal, reproducible example
Assume filenames
is a subset of all my audio files, containing fn_good and fn_bad, where fn_good
is an actual file that gets processed, and fn_bad
is an actual file that raises an error.
def extract_features(filenames):
for fn in filenames:
sr, y = scipy.io.wavfile.read(fn)
print('Signal is: ', y)
print('Sample rate is: ', sr)
Additional info
Using VLC, it seems that the codecs are supported by scipy.io.wavfile
, but in either case, both files have the same codec, so it's weird they don't have the same effect...
Codec of the GOOD file:
I don't know why
scipy.io.wavfile
can't read the file--there might be an invalid chunk in there that other readers simply ignore. Note that even when I read a "good" file withscipy.io.wavfile
, a warning (WavFileWarning: Chunk (non-data) not understood, skipping it.
) is generated:I can read
'fearful_song_strong_dogs_act06_f_0.wav'
usingwavio
(source code on github:wavio
), a package I created that wraps Python's standardwave
library with functions that understand NumPy arrays: