I'm trying to make a simple function to wrap around FFProbe, and most of the data can be retrieved correctly.
The problem is when actually printing the strings to the command line using both Windows Command Prompt and Git Bash for Windows, the output appears mangled and out of order.
Some songs (specifically the file Imagine Dragons - Hit Parade_ Best of the Dance Music Charts\80 - Beazz - Lime (Extended Mix).flac
) are missing metadata. I don't know why, but the dictionary the function below returns is empty.
FFProbe outputs its results to stderr
which can be piped to subprocess.PIPE
, decoded, and parsed. I chose regex for the parsing bit.
This is a slimmed down version of my code below, for the output take a look at the Github gist.
#! /usr/bin/env python3
# -*- coding: utf-8 -*-
import os
from glob import glob
from re import findall, MULTILINE
from subprocess import Popen, PIPE
def glob_from(path, ext):
"""Return glob from a directory."""
working_dir = os.getcwd()
os.chdir(path)
file_paths = glob("**/*." + ext)
os.chdir(working_dir)
return file_paths
def media_metadata(file_path):
"""Use FFPROBE to get information about a media file."""
stderr = Popen(("ffprobe", file_path), shell=True, stderr=PIPE).communicate()[1].decode()
metadata = {}
for match in findall(r"(\w+)\s+:\s(.+)$", stderr, MULTILINE):
metadata[match[0].lower()] = match[1]
return metadata
if __name__ == "__main__":
base = "C:/Users/spike/Music/Deezloader"
for file in glob_from(base, "flac"):
meta = media_metadata(os.path.join(base, file))
title_length = meta.get("title", file) + " - " + meta.get("length", "000")
print(title_length)
I don't understand why the output (the strings can be retrieved from the regex pattern effectively, however the output is strangely formatted when printing) appears disordered only when printing to the console using python's print
function. It doesn't matter how I build the string to print, concatenation, comma-delimited arguments, whatever.
I end up with the length of the song first, and the song name second but without space between the two. The dash is hanging off the end for some reason. Based on the print statement in the code before, the format should be Title - 000
({title} - {length}
) but the output looks more like 000Title -
. Why?
I solved this by the accepted answer in my related question.
I had forgotten about the return carriage at the end of each line. Solutions given are as follows:
universal_newlines=True
in the subprocess call.stderr = Popen(("ffprobe", file_path), shell=True, stderr=PIPE, universal_newlines=True).communicate()[1]
Stripping the whitespace around the line from
stderr
.*.communicate()[1].decode().rstrip()
to strip all whitespace at the end.*.communicate()[1].decode().strip()
to strip all wightspace around.*.communicate()[1].decode()[:-2]
to remove the last two characters.Swallowing
\r
in the regex pattern.findall(r"(\w+)\s+:\s(.+)\r$", stderr, MULTILINE)
This is all very helpful, however I used none of these suggestions.
I didn't know that FFPROBE offers JSON output to STDOUT, but it does. The code to do that is below.
You might also get some use out of the
arg_builder()
. It isn't perfect, but it's good enough for simple shell commands. It isn't made to be idiot proof, it was written with a few holes assuming that the programmer won't break anything.