Performance with bash loop when renaming files

2020-06-27 06:35发布

问题:

Sometimes I need to rename some amount of files, such as add a prefix or remove something. At first I wrote a python script. It works well, and I want a shell version. Therefore I wrote something like that:

$1 - which directory to list, $2 - what pattern will be replacement, $3 - replacement.

echo "usage: dir pattern replacement"
for fname in `ls $1`
do
  newName=$(echo $fname | sed "s/^$2/$3/")
  echo 'mv' "$1/$fname" "$1/$newName&&"
  mv "$1/$fname" "$1/$newName"
done

It works but very slowly, probably because it needs to create a process (here sed and mv) and destroy it and create same process again just to have a different argument. Is that true? If so, how to avoid it, how can I get a faster version?

I thought to offer all processed files a name (using sed to process them at once), but it still needs mv in the loop.

Please tell me, how you guys do it? Thanks. If you find my question hard to understand please be patient, my English is not very good, sorry.

--- update ---

I am sorry for my description. My core question is: "IF we should use some command in loop, will that lower performance?" Because in for i in {1..100000}; do ls 1>/dev/null; done creating and destroying a process will take most of the time. So what I want is "Is there any way to reduce that cost?".

Thanks to kev and S.R.I for giving me a rename solution to rename files.

回答1:

Every time you call an external binary (ls, sed, mv), bash has to fork itself to exec the command and that takes a big performance hit.

You can do everything you want to do in pure bash 4.X and only need to call mv

pat_rename(){ 
  if [[ ! -d "$1" ]]; then
    echo "Error: '$1' is not a valid directory"
    return
  fi
  shopt -s globstar
  cd "$1"
  for file in **; do
    echo "mv $file ${file//$2/$3}"
  done
}


回答2:

Simplest first. What's wrong with rename?

mkdir tstbin
for i in `seq 1 20`
do
   touch tstbin/filename$i.txt
done
rename .txt .html tstbin/*.txt

Or are you using an older *nix machine?



回答3:

To avoid re-executing sed on each file, you could instead setup two name streams, one original, and one transformed, then sip from the ends:

exec 3< <(ls)
exec 4< <(ls | sed 's/from/to/')

IFS=`echo`
while read -u3 orig && read -u4 to; do
    mv "${orig}" "${to}";
done;


回答4:

I think you can store all of file names into a file or string, and use awk and sed do it once instead of one by one.



标签: bash shell unix