I'm working on a music classification methodology with Scikit-learn, and the first step in that process is converting a music file to a numpy array.
After unsuccessfully trying to call ffmpeg from a python script, I decided to simply pipe the file in directly:
FFMPEG_BIN = "ffmpeg"
cwd = (os.getcwd())
dcwd = (cwd + "/temp")
if not os.path.exists(dcwd): os.makedirs(dcwd)
folder_path = sys.argv[1]
f = open("test.txt","a")
for f in glob.glob(os.path.join(folder_path, "*.mp3")):
ff = f.replace("./", "/")
print("Name: " + ff)
aa = (cwd + ff)
command = [ FFMPEG_BIN,
'-i', aa,
'-f', 's16le',
'-acodec', 'pcm_s16le',
'-ar', '22000', # ouput will have 44100 Hz
'-ac', '1', # stereo (set to '1' for mono)
'-']
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
raw_audio = pipe.proc.stdout.read(88200*4)
audio_array = numpy.fromstring(raw_audio, dtype="int16")
print (str(audio_array))
f.write(audio_array + "\n")
The problem is, when I run the file, it starts ffmpeg and then does nothing:
[mp3 @ 0x1446540] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/home/don/Code/Projects/MC/Music/Spaz.mp3':
Metadata:
title : Spaz
album : Seeing souns
artist : N*E*R*D
genre : Hip-Hop
encoder : Audiograbber 1.83.01, LAME dll 3.96, 320 Kbit/s, Joint Stereo, Normal quality
track : 5/12
date : 2008
Duration: 00:03:50.58, start: 0.000000, bitrate: 320 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Output #0, s16le, to 'pipe:':
Metadata:
title : Spaz
album : Seeing souns
artist : N*E*R*D
genre : Hip-Hop
date : 2008
track : 5/12
encoder : Lavf56.4.101
Stream #0:0: Audio: pcm_s16le, 22000 Hz, mono, s16, 352 kb/s
Metadata:
encoder : Lavc56.1.100 pcm_s16le
Stream mapping:
Stream #0:0 -> #0:0 (mp3 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
It just sits there, hanging, for far longer than the song is. What am I doing wrong here?,
Here's what I'm using: It uses pydub (which uses ffmpeg) and scipy.
Full setup (on Mac, may differ on other systems):
Then to read the mp3:
Credit to James Thompson's blog
I recommend you pymedia or audioread or decoder.py. There are also pyffmpeg and similar modules for doing just that what you want. Take a look at pypi.python.org.
Of course, these will not help you turn the data into numpy array.
Anyway, this is how it is done crudely using piping to ffmpeg:
This is how it should work for basic use.
It should work because output of ffmpeg is by default 16 bit audio. But if you mess around, you should know that numpy doesn't have int24, so you will be forced to do some bit manipulations and represent 24 bit audio as 32 bit audio. Just, don't use 24 bit, and the world is happy. :D
We may discuss refining the code in comments, if you need something more sophisticated.