Trying to convert an mp3 file to a Numpy Array, an

2019-07-11 04:40发布

I'm working on a music classification methodology with Scikit-learn, and the first step in that process is converting a music file to a numpy array.

After unsuccessfully trying to call ffmpeg from a python script, I decided to simply pipe the file in directly:

FFMPEG_BIN = "ffmpeg"
cwd = (os.getcwd())
dcwd = (cwd + "/temp")
if not os.path.exists(dcwd): os.makedirs(dcwd)

folder_path = sys.argv[1]
f = open("test.txt","a")

for f in glob.glob(os.path.join(folder_path, "*.mp3")):
    ff = f.replace("./", "/")
    print("Name: " + ff)
    aa = (cwd + ff)

    command = [ FFMPEG_BIN,
        '-i',  aa,
        '-f', 's16le',
        '-acodec', 'pcm_s16le',
        '-ar', '22000', # ouput will have 44100 Hz
        '-ac', '1', # stereo (set to '1' for mono)
        '-']

    pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
    raw_audio = pipe.proc.stdout.read(88200*4)
    audio_array = numpy.fromstring(raw_audio, dtype="int16")
    print (str(audio_array))
    f.write(audio_array + "\n")

The problem is, when I run the file, it starts ffmpeg and then does nothing:

[mp3 @ 0x1446540] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/home/don/Code/Projects/MC/Music/Spaz.mp3':
  Metadata:
    title           : Spaz
    album           : Seeing souns
    artist          : N*E*R*D
    genre           : Hip-Hop
    encoder         : Audiograbber 1.83.01, LAME dll 3.96, 320 Kbit/s, Joint Stereo, Normal quality
    track           : 5/12
    date            : 2008
  Duration: 00:03:50.58, start: 0.000000, bitrate: 320 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Output #0, s16le, to 'pipe:':
  Metadata:
    title           : Spaz
    album           : Seeing souns
    artist          : N*E*R*D
    genre           : Hip-Hop
    date            : 2008
    track           : 5/12
    encoder         : Lavf56.4.101
    Stream #0:0: Audio: pcm_s16le, 22000 Hz, mono, s16, 352 kb/s
    Metadata:
      encoder         : Lavc56.1.100 pcm_s16le
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help

It just sits there, hanging, for far longer than the song is. What am I doing wrong here?,

2条回答
男人必须洒脱
2楼-- · 2019-07-11 05:18

Here's what I'm using: It uses pydub (which uses ffmpeg) and scipy.

Full setup (on Mac, may differ on other systems):

pip install scipy
pip install pydub
brew install ffmpeg  # Or probably "sudo apt-get install ffmpeg on linux"

Then to read the mp3:

import tempfile
import os
import pydub
import scipy
import scipy.io.wavfile


def read_mp3(file_path, as_float = False):
    """
    Read an MP3 File into numpy data.
    :param file_path: String path to a file
    :param as_float: Cast data to float and normalize to [-1, 1]
    :return: Tuple(rate, data), where
        rate is an integer indicating samples/s
        data is an ndarray(n_samples, 2)[int16] if as_float = False
            otherwise ndarray(n_samples, 2)[float] in range [-1, 1]
    """

    path, ext = os.path.splitext(file_path)
    assert ext=='.mp3'
    mp3 = pydub.AudioSegment.from_mp3(file_path)
    _, path = tempfile.mkstemp()
    mp3.export(path, format="wav")
    rate, data = scipy.io.wavfile.read(path)
    os.remove(path)
    if as_float:
        data = data/(2**15)
    return rate, data

Credit to James Thompson's blog

查看更多
Anthone
3楼-- · 2019-07-11 05:37

I recommend you pymedia or audioread or decoder.py. There are also pyffmpeg and similar modules for doing just that what you want. Take a look at pypi.python.org.

Of course, these will not help you turn the data into numpy array.

Anyway, this is how it is done crudely using piping to ffmpeg:

from subprocess import Popen, PIPE
import numpy as np

def decode (fname):
    # If you are on Windows use full path to ffmpeg.exe
    cmd = ["./ffmpeg.exe", "-i", fname, "-f", "wav", "-"]
    # If you are on W add argument creationflags=0x8000000 to prevent another console window jumping out
    p = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    data = p.communicate()[0]
    return np.fromstring(data[data.find("data")+4:], np.int16)

This is how it should work for basic use.

It should work because output of ffmpeg is by default 16 bit audio. But if you mess around, you should know that numpy doesn't have int24, so you will be forced to do some bit manipulations and represent 24 bit audio as 32 bit audio. Just, don't use 24 bit, and the world is happy. :D

We may discuss refining the code in comments, if you need something more sophisticated.

查看更多
登录 后发表回答