I'm trying to read a wav file from the TIMIT database in python but I get an error:
When I'm using wave:
wave.Error: file does not start with RIFF id
When I'm using scipy:
ValueError: File format b'NIST'... not understood.
and when I'm using librosa, the program got stuck.
I tried to convert it to wav using sox:
cmd = "sox " + wav_file + " -t wav " + new_wav
subprocess.call(cmd, shell=True)
and it didn't help. I saw an old answer referencing to the package scikits.audiolab but it looks like it is no longer supported.
How can I read these file to get a ndarray of the data?
Thanks
Your file is not a WAV file. Apparently it is a NIST SPHERE file. From the LDC web page: "Many LDC corpora contain speech files in NIST SPHERE format." According to the description of the NIST File Format, the first four characters of the file are NIST
. That's what the scipy error is telling you: it doesn't know how to read a file that begins with NIST
.
I suspect you'll have to convert the file to WAV if you want to read the file with any of the libraries that you tried. To force the conversion to WAV using the program sph2pipe
, use the command option -f wav
(or equivalently, -f rif
), e.g.
sph2pipe -f wav input.sph output.wav
issue this from command line to verify its a wav file ... or not
xxd -b myaudiofile.wav | head
if its wav format it will appear something like
00000000: 01010010 01001001 01000110 01000110 10111100 10101111 RIFF..
00000006: 00000001 00000000 01010111 01000001 01010110 01000101 ..WAVE
0000000c: 01100110 01101101 01110100 00100000 00010000 00000000 fmt ..
00000012: 00000000 00000000 00000001 00000000 00000001 00000000 ......
00000018: 01000000 00011111 00000000 00000000 01000000 00011111 @...@.
0000001e: 00000000 00000000 00000001 00000000 00001000 00000000 ......
00000024: 01100100 01100001 01110100 01100001 10011000 10101111 data..
0000002a: 00000001 00000000 10000001 10000000 10000001 10000000 ......
00000030: 10000001 10000000 10000001 10000000 10000001 10000000 ......
00000036: 10000001 10000000 10000001 10000000 10000001 10000000 ......
notice the wav file begins with the characters RIFF
which is the mandatory indicator the file is using wav codec ... if your system (I'm on linux) does not have above command line utility : xxd then use any hex editor like wxHexEditor to similarily examine your wav file to confirm you see the RIFF ... if no RIFF then its simply not a wav file
Here are details of wav format specs
http://soundfile.sapp.org/doc/WaveFormat/
http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html
http://unusedino.de/ec64/technical/formats/wav.html
http://www.drdobbs.com/database/inside-the-riff-specification/184409308
https://www.gamedev.net/articles/programming/general-and-gameplay-programming/loading-a-wave-file-r709
http://www.topherlee.com/software/pcm-tut-wavformat.html
http://www.labbookpages.co.uk/audio/javaWavFiles.html
http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html
http://nagasm.org/ASL/sound05/
If you want a generic code that works for every wav file inside the folder run:
forfiles /s /m *.wav /c "cmd /c sph2pipe -f wav @file @fnameRIFF.wav"
It search for every wav file that can find and create a wav file that both scipy and wave can read with the name < base_name >RIFF.wav
I have written a python script which will convert all the .WAV files in NIST format spoken by all speakers from all dialects to .wav files which ca
n be played on your system.
Note: All the dialects folders are present in ./TIMIT/TRAIN/ . You may have to change the dialects_path according to your project structure(or if you are on Windows)
from sphfile import SPHFile
dialects_path = "./TIMIT/TRAIN/"
for dialect in dialects:
dialect_path = dialects_path + dialect
speakers = os.listdir(path = dialect_path)
for speaker in speakers:
speaker_path = os.path.join(dialect_path,speaker)
speaker_recordings = os.listdir(path = speaker_path)
wav_files = glob.glob(speaker_path + '/*.WAV')
for wav_file in wav_files:
sph = SPHFile(wav_file)
txt_file = ""
txt_file = wav_file[:-3] + "TXT"
f = open(txt_file,'r')
for line in f:
words = line.split(" ")
start_time = (int(words[0])/16000)
end_time = (int(words[1])/16000)
print("writing file ", wav_file)
sph.write_wav(wav_file.replace(".WAV",".wav"),start_time,end_time)