Basic Powershell - batch convert Word Docx to PDF

2019-01-16 05:40发布

问题:

I am trying to use PowerShell to do a batch conversion of Word Docx to PDF - using a script found on this site: http://blogs.technet.com/b/heyscriptingguy/archive/2013/03/24/weekend-scripter-convert-word-documents-to-pdf-files-with-powershell.aspx

# Acquire a list of DOCX files in a folder
$Files=GET-CHILDITEM "C:\docx2pdf\*.DOCX"
$Word=NEW-OBJECT –COMOBJECT WORD.APPLICATION

Foreach ($File in $Files) {
    # open a Word document, filename from the directory
    $Doc=$Word.Documents.Open($File.fullname)

    # Swap out DOCX with PDF in the Filename
    $Name=($Doc.Fullname).replace("docx","pdf")

    # Save this File as a PDF in Word 2010/2013
    $Doc.saveas([ref] $Name, [ref] 17)  
    $Doc.close()
}

And I keep on getting this error and can't figure out why:

PS C:\docx2pdf> .\docx2pdf.ps1
Exception calling "SaveAs" with "16" argument(s): "Command failed"
At C:\docx2pdf\docx2pdf.ps1:13 char:13
+     $Doc.saveas <<<< ([ref] $Name, [ref] 17)
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : DotNetMethodException

Any ideas?

Also - how would I need to change it to also convert doc (not docX) files, as well as use the local files (files in same location as the script location)?

Sorry - never done PowerShell scripting...

回答1:

This will work for doc as well as docx files.

$documents_path = 'c:\doc2pdf'

$word_app = New-Object -ComObject Word.Application

# This filter will find .doc as well as .docx documents
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {

    $document = $word_app.Documents.Open($_.FullName)

    $pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"

    $document.SaveAs([ref] $pdf_filename, [ref] 17)

    $document.Close()
}

$word_app.Quit()


回答2:

This works for me (Word 2007):

$wdFormatPDF = 17
$word = New-Object -ComObject Word.Application
$word.visible = $false

$folderpath = Split-Path -parent $MyInvocation.MyCommand.Path

Get-ChildItem -path $folderpath -recurse -include "*.doc" | % {
    $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf("."))
    $doc = $word.documents.open($_.fullname)
    $doc.saveas($path, $wdFormatPDF) 
    $doc.close()
}

$word.Quit()


回答3:

The above answers all fell short for me, as I was doing a batch job converting around 70,000 word documents this way. As it turns out, doing this repeatedly eventually leads to Word crashing, presumably due to memory issues (the error was some COMException that I didn't know how to parse). So, my hack to get it to proceed was to kill and restart word every 100 docs (arbitrarily chosen number).

Additionally, when it did crash occasionally, there would be resulting malformed pdfs, each of which were generally 1-2 kb in size. So, when skipping already generated pdfs, I make sure they are at least 3kb in size. If you don't want to skip already generated PDFs, you can delete that if statement.

Excuse me if my code doesn't look good, I don't generally use Windows and this was a one-off hack. So, here's the resulting code:

$Files=Get-ChildItem -path '.\path\to\docs' -recurse -include "*.doc*"

$counter = 0
$filesProcessed = 0
$Word = New-Object -ComObject Word.Application

Foreach ($File in $Files) {
    $Name="$(($File.FullName).substring(0, $File.FullName.lastIndexOf("."))).pdf"
    if ((Test-Path $Name) -And (Get-Item $Name).length -gt 3kb) {
        echo "skipping $($Name), already exists"
        continue
    }

    echo "$($filesProcessed): processing $($File.FullName)"
    $Doc = $Word.Documents.Open($File.FullName)
    $Doc.SaveAs($Name, 17)
    $Doc.Close()
    if ($counter -gt 100) {
        $counter = 0
        $Word.Quit()
        [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Word)
        $Word = New-Object -ComObject Word.Application
    }
    $counter = $counter + 1
    $filesProcessed = $filesProcessed + 1
}


回答4:

Neither of the solutions posted here worked for me on Windows 8.1 (btw. I'm using Office 365). My PowerShell somehow does not like the [ref] arguments (I don't know why, I use PowerShell very rarely).

This is the solution that worked for me:

$Files=Get-ChildItem 'C:\path\to\files\*.docx'

$Word = New-Object -ComObject Word.Application

Foreach ($File in $Files) {
    $Doc = $Word.Documents.Open($File.FullName)
    $Name=($Doc.FullName).replace('docx', 'pdf')
    $Doc.SaveAs($Name, 17)
    $Doc.Close()
}