Detect duplicate MP3 files with different bitrates

2019-01-08 16:13发布

How could I detect (preferably with Python) duplicate MP3 files that can be encoded with different bitrates (but they are the same song) and ID3 tags that can be incorrect?

I know I can do an MD5 checksum of the files content but that won't work for different bitrates. And I don't know if ID3 tags have influence in generating the MD5 checksum. Should I re-encode MP3 files that have a different bitrate and then I can do the checksum? What do you recommend?

9条回答
Evening l夕情丶
2楼-- · 2019-01-08 16:34

Re-encoding at the same bit rate won't work, in fact it may make things worse as transcoding (that is what re-encoding at different bitrates is called) is going to change the nature of the compression, you are recompressing an already compressed file is going to lead to a significantly different file.

This is a little out of my league but I would approach the problem by looking at the wave pattern of the MP3. Either by converting the MP3 to an uncompressd .wav or maybe by just running the analysis on the MP3 file itself. There should be a library out there for this. Just a word of warning, this is an expensive operation.

Another idea, use ReplayGain to scan the files. If they are the same song, they should be be tagged with the same gain. This will only work on the exact same song from the exact same album. I know of several cases were reissues are remastered at a higher volume, thus changing the replaygain.

EDIT:
You might want to check out http://www.speech.kth.se/snack/, which apparently can do spectrogram visualization. I imagine any library that can visual spectrogram can help you compare them.

This link from the official python page may also be helpful.

查看更多
仙女界的扛把子
3楼-- · 2019-01-08 16:35

The Dejavu project is written in Python and does exactly what you are looking for.

https://github.com/worldveil/dejavu

It also supports many common formats (.wav, .mp3, etc) as well as finding the time offset of the clip in the original audio track.

查看更多
祖国的老花朵
4楼-- · 2019-01-08 16:40

I'm looking for something similar and I found this:
http://www.lastfm.es/user/nova77LF/journal/2007/10/12/4kaf_fingerprint_(command_line)_client

Hope it helps.

查看更多
登录 后发表回答