How Do I Make a Bash Script To Find Unused Images

2019-04-15 17:33发布

问题:

How do I make a Bash shell script that can identify all the .jpg, .gif, and .png files, and then identify which of these files are not linked via url(), href, or src in any text file in a folder?

Here's what I started, but I end up getting the inverse of what I want. I don't want to know referenced images, but unreferenced (aka "orphaned") images:

# Change MYPATH to the path where you have the project
find MYPATH -name *.jpg -exec basename {} \; > /tmp/patterns
find MYPATH -name *.png -exec basename {} \; >> /tmp/patterns
find MYPATH -name *.gif -exec basename {} \; >> /tmp/patterns

# Print a list of lines that reference these files
# The cat command simply removes coloring
grep -Rf /tmp/patterns MYPATH | cat

# great -- but how do I print the lines of /tmp/patterns *NOT* listed in any given
# *.php, *.css, or *.html?

回答1:

With drysdam's help, I created this Bash script, which I call orphancheck.sh and call with "./orphancheck.sh myfolder".

#!/bin/bash

MYPATH=$1

find "$MYPATH" -name *.jpg -exec basename {} \; > /tmp/patterns
find "$MYPATH" -name *.png -exec basename {} \; >> /tmp/patterns
find "$MYPATH" -name *.gif -exec basename {} \; >> /tmp/patterns

for p in $(cat /tmp/patterns); do
    grep -R $p "$MYPATH" > /dev/null || echo $p;
done


回答2:

I'm a little late to the party (I found this page while looking for the answer myself), but in case it's useful to someone, here is a slightly modified version that returns the path with the filename (and searches for a few more file types):

#!/bin/bash

if [ $# -eq 0 ]
  then
    echo "Please supply path to search under"
    exit 1
fi
MYPATH=$1

find "$MYPATH" -name *.jpg > /tmp/patterns
find "$MYPATH" -name *.png >> /tmp/patterns
find "$MYPATH" -name *.gif >> /tmp/patterns
find "$MYPATH" -name *.js >> /tmp/patterns
find "$MYPATH" -name *.php >> /tmp/patterns

for p in $(cat /tmp/patterns); do
    f=$(basename $p);
    grep -R $f "$MYPATH" > /dev/null || echo $p;
done

It's important to note, though, that you can get false positives just looking at the code statically like this, because code might dynamically create a filename that is then referenced (and expected to exist). So if you blindly delete all files whose paths are returned by this script, without some knowledge of your project, you might regret it.



回答3:

ls -R *jpg *gif *png | xargs basename > /tmp/patterns
grep -f /tmp/patterns *html

The first line (recursively--your problem is ill-specified, so I thought I'd be a little general) finds all images and strips off the directory portion using basename. Save that in a list of patterns. Then grep using that list in all the html files.