Need to crop+resize ~300000 files. Runtime = 4+ da

2019-05-17 20:21发布

问题:

I am working on creating a video timelapse. All the photos I took are .jpg images shot at 4:3 aspect ratio. 2592x1944 resolution. I want them all to be 16:9 at 1920x1080.

I have written a little script to do this, but the process is not very fast. It took about 17 minutes for me to crop and resize 750 images. I have a total of about 300,000 to deal with, and will probably be doing then in batches of about 50,000. That is 18 hours 45 minutes per batch, and over 4.5 days of computing total.

So does anyone know a way I can speed up this program?

here is the bash script I have written:

#!/bin/bash  

mkdir cropped

for f in *.JPG
do
    convert $f -resize 1920x1440 -set filename:name '%t' cropped/'%[filename:name].JPG' #Resize Photo, maintain aspect ratio
    convert cropped/$f -crop 1920x1080+0+$1 -set filename:name '%t' cropped/'%[filename:name].JPG' #Crop to 16:9 aspect ratio, takes in $1 argument for where to begin crop
done

echo Cropping Complete!

Putting some echo commands before and after each line within the loop reveals that resizing takes much more time than cropping, which I guess is not surprising. I have tried using mogrify -path cropped -resize 1920x1440! $f in place of convert $f -resizebut there does not seem to be much of a difference in speed.

So, any way I can speed up the runtime on this?

BONUS POINTS if you can show me an easy way to give a simple indication of progress as the program runs (something like "421 of 750 files, 56.13% complete").

EXTRA BONUS POINTS if you can add a command to output a .mp4 file from each frame that can be edited in a software program like SONY Vegas. I have managed to make video files (.avi) using mencoder from these photos, but the resulting video wont work in any video editors I have tried.

回答1:

A few things spring to mind...

Firstly, don't start ImageMagick twice per image, once to resize it and once to crop it when it should be possible to do both operations in one go. So, instead of your two convert commands, I would do just one

convert image.jpg -resize 1920x1440 -crop 1920x1080+0+$1 cropped/image.jpg

Secondly, I don't see what you are doing with the set command, something with the filename, but you can just do that in the shell.

Thirdly, I would suggest you use GNU Parallel (I regularly process upwards of 65,000 images per day with it). It is easy to install and will ensure all those lovely CPU cores you paid for are kept busy. The easiest way to use it is, instead of running commands, just echo them and pipe them into parallel

#!/bin/bash  
mkdir cropped

for f in *.jpg
do
   echo convert \"$f\" -resize 1920x1440 -crop 1920x1080+0+$1 cropped/\"$f\"
done  | parallel

echo Cropping Complete!

Finally, if you want a progress meter, or indication of how much is done and what is left to do, use the --eta option (eta=Estimated Time of Arrival) to parallel and it tells you how many jobs and how much time is remaining.

When you get confident with parallel you will maybe run your entire process like this:

parallel --eta convert {} -resize 1920x1440 -crop 1920x1080+0+32 cropped/{} ::: *.jpg

I created 750 images the same size as yours and ran them this way and it takes my medium spec iMac 55 seconds to resize and crop the lot - YMMV. Please add a comment and say how you got on - how long the processing time is with parallel.



回答2:

Firstly in order to speed up don't echo stuff to the screen echo it to a file and if you want to know the status read the file (easily done with tail command), seriously this will already be faster. However this doesn't seem like the real bottleneck of your program. The main thing I can recommend is to run it in parallel, is there any reason why you can't crop+resize pic #1000 before pic #4? If not then modify the script to receive some parameter that specifies which files it should work on and then run it a few times with different parameters, this should cut down the time by about as many CPU cores as you have (minus some hard-drive I/O time). Regarding your first bonus question you can do a variant of this code

TOTAL=`ls -1|wc -l` #get the total number of files (you can change this to the files parameter I mentioned above
SOFAR=0 #How many files you've done so far
for f in *.JPG
do
    ((SOFAR++)) 
    echo "done so far $SOFAR out of $TOTAL"
done


回答3:

Use the

-define jpeg:size=1920x1440

option along with -resize. If you have a older version of ImageMagick (sorry, I don't know exactly when the syntax changed), use the

-size 1920x1440

option along with -resize.