How do I make a Bash shell script that can identify all the .jpg, .gif, and .png files, and then identify which of these files are not linked via url(), href, or src in any text file in a folder?
Here's what I started, but I end up getting the inverse of what I want. I don't want to know referenced images, but unreferenced (aka "orphaned") images:
# Change MYPATH to the path where you have the project
find MYPATH -name *.jpg -exec basename {} \; > /tmp/patterns
find MYPATH -name *.png -exec basename {} \; >> /tmp/patterns
find MYPATH -name *.gif -exec basename {} \; >> /tmp/patterns
# Print a list of lines that reference these files
# The cat command simply removes coloring
grep -Rf /tmp/patterns MYPATH | cat
# great -- but how do I print the lines of /tmp/patterns *NOT* listed in any given
# *.php, *.css, or *.html?
I'm a little late to the party (I found this page while looking for the answer myself), but in case it's useful to someone, here is a slightly modified version that returns the path with the filename (and searches for a few more file types):
It's important to note, though, that you can get false positives just looking at the code statically like this, because code might dynamically create a filename that is then referenced (and expected to exist). So if you blindly delete all files whose paths are returned by this script, without some knowledge of your project, you might regret it.
With drysdam's help, I created this Bash script, which I call orphancheck.sh and call with "./orphancheck.sh myfolder".
The first line (recursively--your problem is ill-specified, so I thought I'd be a little general) finds all images and strips off the directory portion using
basename
. Save that in a list of patterns. Thengrep
using that list in all the html files.