how to find all files that dont have a matching fi

2019-09-09 17:03发布

i have a folder with over 1 million files. the files come in couples that only differ by their extension (e.g. a1.ext1 a1.ext2, a2.ext1, a2.ext2 ...)

i need to scan this folder and make sure that it fulfills this requirement (of file coupling), and if i find a file without its match i should delete it.

i've already done it in python, but it was super slow when it came to working with the 7-figure number of files..

is there a way to do this using a shell command/script?

3条回答
劫难
2楼-- · 2019-09-09 17:18

Building on another answer, you could use script like this (it is supposed to be in the same directory where files are located, and should be executed there):

#!/usr/bin/env bash 
THRASH=../THRASH
mkdir "$THRASH" 2> /dev/null

for name in $(ls *.{ext1,ext2} | cut -d. -f1 | sort -u); do
    if [ $(ls "$name".{ext1,ext2} 2> /dev/null | wc -w) -lt 2 ]; then
        mv "$name".{ext1,ext2} "$THRASH" 2> /dev/null
    fi;
done

You can configure where to move files that doesn't have their pair by modifying THRASH variable.

On dual core Pentium with 3.0 GHz and 2 GB of RAM one run took 63.7 seconds (10000 pairs, with about 1500 of each member of the pair missing from the folder).

查看更多
聊天终结者
3楼-- · 2019-09-09 17:22

Python should be faster; however if you want to try in bash:

for file in $(ls | cut -d. -f1 | sort -u); do
    if [ $(ls $file.* | wc -l) -ne 2 ]; then
        echo "too much extension for $file"
    fi
done

This should display filenames with more or less than two extensions.

查看更多
冷血范
4楼-- · 2019-09-09 17:30

Try this one:

#!/bin/bash

for file in *.ext1 *.ext2
do
  #name is the substring before the '.'
  name=${file%.*}
  #ext is the substring after the '.'
  ext=${file#*.}
  case $ext in
    "ext1")
      sibling="$name.ext2";
      #does it haves a sibling?
      #if it does not,remove the file
      ls | grep $sibling >/dev/null;
      if [ $? -ne 0 ]
      then
        rm $file
      fi;;
    "ext2")
      sibling="$name.ext1";
      #does it haves a sibling?
      #if it does not,remove the file
      ls | grep $sibling >/dev/null;
      if [ $? -ne 0 ]
      then
        rm $file
      fi;;
  esac      
done
查看更多
登录 后发表回答