Finding unused images in a Rails app?

2019-03-19 07:32发布

问题:

I'm familiar with tools like Deadweight for finding CSS not in use in your Rails app, but does anything exist for images? I'm sitting in a project with a massive directory of assets from working with a variety of designers and I'm trying to trim the fat in this project. It's especially a pain when moving assets to our CDN.

Any thoughts?

回答1:

It depends greatly on the code using the images. It's always possible that a filename is computed (by concatenating two values or string substitution etc) so a simply grepping by filename isn't necessarily enough.

You could try running wget (probably already installed if you've got a linux machine, otherwise http://users.ugent.be/~bpuype/wget/ ) to mirror your whole site. Do this on the same machine or network if you can, it'll crawl your whole site and grab all the images

# mirror mysite.com accepting only jpg, png and gif files
wget -A jpg,png,gif --mirror www.mysite.com

Once you've done that, you're going to have a second copy of your site's hierarchy containing any images that are actively linked to by any page reachable by crawling your site. You can then backup your source image directory, and replace it with wget's copy. Next, monitor your log files for 404's pertaining to gif/jpg/png files. Hope that helps.



回答2:

Finding unsed images should be easier than CSS.

Just find *.jpg *.png *gif with glob, put those filenames to dictionary or array and find those filenames againt html, css, js files, remove filename if found and you will get unused list, and move those images to another folder with same directory structure (It will be good for restoring for just in case)

Basically like this, and of course for the file names that encrypted/encoded/obcuscated will not work.

require "fileutils"

img=Dir.glob("**/*.jpg")+Dir.glob("**/*.png")+Dir.glob("**/*.gif")
data=Dir.glob("**/*.htm*")+Dir.glob("**/*.css")+Dir.glob("**/*.js")

puts img.length.to_s+" images found & "+data.length.to_s+" files found to search against"

content=""
data.each do |f|
    content+=File.open(f, 'r').read   
end

img.each do |m|
    if not content=~ Regexp.new("\\b"+File.basename(m)+"\\b")
        FileUtils.mkdir_p "../unused/"+File.dirname(m)
        FileUtils.mv m,"../unused/"+m
        puts "Image "+m+" moved to ../unused/"+File.dirname(m)+" folder"
    end
end

PS: I used fileutils, because normal makedirs and mv are not works in my windows version of ruby

And I am not good at ruby, so please double check it before you use it.

Here is the sample results I ran in root folder of sample rails folder in my windows

---\ruby>ruby img_coverage.rb
5 images found & 12 files found to search against
Image depot/public/images/test.jpg moved to ../unused/depot/public/images folder


回答3:

If your image URLs often come from many computed / concatenated strings and other stuff hard to track programmatically within your source code, and your application is in heavy use, you could try a soft "honeypot" approach like this:

  • Move all the assets to a different directory, e.g. /attic
  • Set up an empty /images directory (or what your asset directory is called)
  • Set up a .htaccess file (if you're on Apache of course) that, using the -f flag, redirects all requests to nonexistent image files to a script
  • The script copies the requested file from the /attic into the /images directory and displays it
  • The next request to that image will go directly to the image, because it exists now

After some time and sufficient usage, all needed images should have been copied to the assets directory.

It's a "soft" approach of course because a dialog / situation could have not been opened/entered/used by any user during that time (things like error message icons for example). But it will recognize all used files, no matter where they're requested from, and might help sort out much of the unneeded files.



回答4:

If your file manager supports it, try sorting your images directory by the files' "last accessed" date. Files that haven't been accessed in a long time most likely aren't used any longer.

Along the same lines, you can also filter or grep through your web server's logs and make a list of the image files that it has served up in the last several months. Any images not in this list are likely unused.