At the moment i have a bunch of mp3 files and their features from the dataset here. All of the spectrograms are pre computed so I wanted to know how to load a given spectrogram from file and display it at the very least. Ideally i would like to be able to skip to a point in the spectrogram, with a given time code.
问题:
回答1:
The generation script for the features is posted here. It states that the features are saved using np.savetxt
, which means that you can load them using np.loadtext
.
Once the features/spectrograms are loaded they behave like regular numpy arrays. Knowing the hop length and the sampling rate will let you figure out time codes for spectrogram frames. Note, that perhaps not all spectrograms have the same hop length! Therefore, it's probably worth it to pay close attention to how features are extracted in the script.
As example, consider melspec
, which is based on librosa.feature.melspectrogram
(see here). librosa by default resamples audio to 22,050 Hz (mono), the audio is then passed to melspectrogram
, which by default uses a hop length of 512 samples (see docs). So if 22,050 samples correspond to one second of audio (and that is what sample rate 22.05 kHz means), then 512 samples correspond to 512/22050 Hz = 0.023s. That is, each frame in the spectrogram corresponds to roughly 23 ms.
To display spectrograms, use librosa.display.specshow
.
Code sample adapted from the docs:
import matplotlib.pyplot as plt
import numpy as np
# display power spectrogram
S = np.loadtxt('your_stft_spectrogram_file')
plt.figure(figsize=(12, 8))
D = librosa.amplitude_to_db(np.abs(S), ref=np.max)
plt.subplot(4, 2, 1)
librosa.display.specshow(D, y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
plt.show()