I'm using the following powershell script to open a few thousand HTML files and "save as..." Word documents.
param([string]$htmpath,[string]$docpath = $docpath)
$srcfiles = Get-ChildItem $htmPath -filter "*.htm*"
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatDocument");
$word = new-object -comobject word.application
$word.Visible = $False
function saveas-document
{
$opendoc = $word.documents.open($doc.FullName);
$opendoc.saveas([ref]"$docpath\$doc.FullName.doc", [ref]$saveFormat);
$opendoc.close();
}
ForEach ($doc in $srcfiles)
{
Write-Host "Processing :" $doc.FullName
saveas-document
$doc = $null
}
$word.quit();
The content converts splendidly, but my filename is not as expected.
$opendoc.saveas([ref]"$docpath\$doc.FullName.doc", [ref]$saveFormat);
results in foo.htm
saving as foo.htm.FullName.doc
instead of foo.doc
.
$opendoc.saveas([ref]"$docpath\$doc.BaseName.doc", [ref]$saveFormat);
yields foo.htm.BaseName.doc
How do I set up a Save As...
filename variable equal to a concatenation of BaseName
and .doc
?
Based on our comments above, it seems that moving the files is all you want to accomplish. The following works for me. In the current directory, it replaces .txt extensions with .py extensions. I found the command here.
PS C:\testing dir *.txt | Move-Item -Destination {[IO.Path]::ChangeExtension( $_.Name, "py")}
You can also change *.txt
to C:\path\to\file\*.txt
so you don't need to execute this line from the location of the files. You should be able to define a destination in a similar manner, so I'll report back if I find a simple way to do that.
Also, I found Microsoft's TechNet Library while I was searching. It has many tutorials on scripting using PowerShell. Files and Folders, Part 3: Windows PowerShell should help you to find additional info on copying and moving files.
I was having problems just converting the filename from .html
to .docx
. I took your code above and changed it to this:
function Convert-HTMLtoDocx {
param([string]$htmpath)
$srcfiles = Get-ChildItem $htmPath -filter "*.htm*"
$saveFormat = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument
$word = new-object -comobject word.application
$word.Visible = $False
ForEach ($doc in $srcfiles) {
Write-Host "Processing :" $doc.fullname
$name = Join-Path -Path $doc.DirectoryName -ChildPath $($doc.BaseName + ".docx")
$opendoc = $word.documents.open($doc.FullName)
$opendoc.saveas([ref]$name.Value,[ref]$saveFormat)
$opendoc.close()
$doc = $null
} #End ForEach
$word.quit()
} #End Function
The problem was the save format. For whatever reason, so save a document as a .docx
you need to specify the format at wdFormatXMLDocument
not wdFormatDocument
.
This does a recursive walk of a root folder and writes and .doc to .htm filtered:
$docpath = "\\sf-xyz-serverabc01\ChangeTheseDocuments"
$WdTypes = Add-Type -AssemblyName 'Microsoft.Office.Interop.Word, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c' -Passthru
$srcfiles = get-childitem $docpath -filter "*.doc" -rec | where {!$_.PSIsContainer} | select-object FullName
$saveFormat = $WdTypes | Where {$_.Name -eq 'WdSaveFormat'}
$word = new-object -comobject word.application
$word.Visible = $False
function saveas-filteredhtml
{
$opendoc = $word.documents.open($doc.FullName);
$Name=($doc.Fullname).replace("doc","htm")
$opendoc.saveas([ref]$Name, [ref]$saveFormat::wdFormatFilteredHTML);
$opendoc.close();
}
ForEach ($doc in $srcfiles)
{
Write-Host "Processing :" $doc.FullName
saveas-filteredhtml
$doc = $null
}
$word.quit();
I know this is an older post but I am posting this code here so that I can find it in the future
**
This does a recursive walk of a root folder and Converts Doc and DocX to Txt
**
Here is a LINK to the diffierent formats you can save to.
$docpath = "C:\Temp"
$WdTypes = Add-Type -AssemblyName 'Microsoft.Office.Interop.Word, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c' -Passthru
$srcfiles = get-childitem $docpath -filter "*.doc" -rec | where {!$_.PSIsContainer} | select-object FullName
$saveFormat = $WdTypes | Where {$_.Name -eq 'WdSaveFormat'}
$word = new-object -comobject word.application
$word.Visible = $False
function saveas-filteredhtml
{
$opendoc = $word.documents.open($doc.FullName);
$Name=($doc.Fullname).replace(".docx",".txt").replace(".doc",".txt")
$opendoc.saveas([ref]$Name, [ref]$saveFormat::wdFormatDOSText); ##wdFormatDocument
$opendoc.close();
}
ForEach ($doc in $srcfiles)
{
Write-Host "Processing :" $doc.FullName
saveas-filteredhtml
$doc = $null
}
$word.quit();