identifying .mp3 not by name with shell script

I would like to write a script that finds duplicate mp3's by content and not bf file name. I am wondering how one goes about seeing a file types inner data for the sake of comparison. Thank you.

标签： bash shell

4条回答

别忘想泡老子

2楼-- · 2019-07-13 11:40

I use this script for my photo but it can be used for other files.

First I transfer pictures from my phone/camera to directory newfiles
Then I run this script from my pictures root directory
- On detection of duplicated files, the script keeps one file and moves the other ones to directory ../garbage
- The script moves in priority file from newfiles

Caution: This script does not compare file content, but it detects files having same size & name (this is OK for camera files). My other answer is based on content comparison (md5sum).

#!/bin/bash

# If a file from directory 'newfile' has same size & name
# that another file from another directory 
# then moves the file from 'newfile' to 'garbage'
find newfiles/ -type f -printf '%s %f\n' | 
while read SIZE f
do
   find . -name "$f" -size ${SIZE}c | 
     grep -v 'newfiles' && 
     find . -name "$f" -size ${SIZE}c -path '*newfiles*' -exec mv -v '{}' ../garbage ';' &&
     echo
done

# Detect all other duplicated files
# Keep the first occurrence and moves all other to 'garbage'
find . -type f -printf '%s %f\n' | 
  LC_ALL=C sort |  #LC_ALL=C disables locale => sort is faster
  uniq -dc      |  #keep duplicates and count number of occurrences 
  while read n SIZE f
  do
    echo -e "\n_____ $n files\t$SIZE bytes\tname: $f"
    find . -name "$f" -size ${SIZE}c |
       head -n 1 | 
       xargs mv -v -t ../garbage
  done

0人赞添加讨论(0) 举报

我命由我不由天

3楼-- · 2019-07-13 11:46

cmp can be used to compare binary files.

cmp file1.mp3 file2.mp3
if [[ $? -eq 0 ]]; then echo "Matched"; fi

cmp command returns 0 if the files are same or else -1.

0人赞添加讨论(0) 举报

小情绪 Triste *

4楼-- · 2019-07-13 11:46

If the files are really byte-to-byte equivalent, you can start searching for files of the same size. If their size is the same, you can investigate further (e.g. compare their md5sum). If the files just contain the same song, but use a different codec/compression/whatever, bash is probably not the right tool for the task.

0人赞添加讨论(0) 举报

疯言疯语

5楼-- · 2019-07-13 11:47

This first command line lists all files having same size and same md5sum from the current directory

find . -type f -printf '%11s ' -exec md5sum '{}' ';' | 
  sort | uniq -w44 --all-repeated=separate

The second command line is

faster because it calculates md5sum solely for the files having the same size
more robust because it handles filenames having special characters as 'space' or 'newline'

Therefore it is also more complex

find . -type f -printf '%11s %P\0' | 
  LC_ALL=C sort -z | 
  uniq -Dzw11 | 
  while IFS= read -r -d '' line
  do
    md5sum "${line:12}"
  done | 
  uniq -w32 --all-repeated=separate | 
  tee duplicated.log

Some explanations

# Print file size/md5sum/name in one line (size aligned in 11 characters)
find . -printf '%11s ' -exec md5sum '{}' ';'

# Print duplicated lines considering the the first 44 characters only
# 44 characters = size (11 characters) + one space + md5sum (32 characters)
uniq -w44 --all-repeated=separate

# Print size and path/filename terminated by a null character
find . -printf '%11s %P\0'

# Sort lines separated by a null character (-z) instead of a newline character
# based on native byte value (LC_ALL=C) instead of locals
LC_ALL=C sort -z  

# Read lines separated by null character
IFS= read -r -d '' line

# Skip the first 12 characters (size and space) 
# in order to obtain the rest: path/filename
"${line:12}"

0人赞添加讨论(0) 举报

identifying .mp3 not by name with shell script

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间