How to convert a PDF into an array of images, with

2020-07-23 09:33发布

问题:

I'm converting uploaded PDFs into images, with one image per page. I have figured out how to generate the images using MiniMagick::Tool::Convert, but I don't know how to write the version block for the Uploader, so that I can access an array of image URLs.

Here's my uploader so far:

class DocumentUploader < CarrierWave::Uploader::Base
  include CarrierWave::MiniMagick

  storage :file
  # storage :fog

  def store_dir
    "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
  end

  version :jpg do
    process :convert_to_images
    process :set_content_type_jpg

    def convert_to_images(*args)
      image = MiniMagick::Image.open(current_path)
      image.pages.each_with_index do |page, index|
        MiniMagick::Tool::Convert.new do |convert|
          convert.background 'white'
          convert.flatten
          convert.density 300
          convert.quality 95
          convert << page.path
          convert << "#{CarrierWave.root}/#{store_dir}/image-#{index}.jpg"
        end
      end
    end
  end

  def set_content_type_jpg(*args)
    self.file.instance_variable_set(:@content_type, "image/jpg")
  end

  # Add a white list of extensions which are allowed to be uploaded.
  def extension_white_list
    %w(jpg jpeg gif png doc docx pdf)
  end
end

This generates image-0.jpg, image-1.jpg, etc. in the correct directory. But now I have no way of referencing those images in my views, or even knowing how many there are. This will also not work when I need to upload the images to S3. How can I get Carrierwave to handle the file storage for this collection of images, instead of a single image?

It also looks like I will probably need to add a new database column to store the number of pages. Is there a way to make my uploader return an array of image URLs, based on this count?

I'm also willing to switch to another gem. Is this something that would be easier with Paperclip, Shrine, or Refile?

回答1:

With Shrine you can make each page a different version:

class ImageUploader < Shrine
  plugin :versions
  plugin :processing

  process(:store) do |io, context|
    pdf      = io.download
    versions = {}

    image = MiniMagick::Image.new(pdf.path)
    image.pages.each_with_index do |page, index|
      page_image = Tempfile.new("version-#{index}", binmode: true)
      MiniMagick::Tool::Convert.new do |convert|
        convert.background 'white'
        convert.flatten
        convert.density 300
        convert.quality 95
        convert << page.path
        convert << page_image.path
      end
      page_image.open # refresh updated file
      versions[:"page_#{index + 1}"] = page_image
    end

    versions
  end
end

Assuming you have a Document model and you attached a PDF to a file attachment field, you can then retrieve an array of pages using Hash#values:

pages = document.file.values
pages #=> [...array of pages...]
pages.count #=> number of pages