I have to perform a large number of replacements in some documents, and the thing is, I would like to be able to automate that task. Some of the documents contain common strings, and this would be pretty useful if it could be automated. From what I read so far, COM could be one way of doing this, but I don't know if text replacement is supported.
I'd like to be able to perform this task in python? Is it possible? Could you post a code snippet showing how to access the document's text?
Thanks!
See if this gives you a start on word automation using python.
Once you open a document, you could do the following.
After the following code, you can Close the document & open another.
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "test"
.Replacement.Text = "test2"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchKashida = False
.MatchDiacritics = False
.MatchAlefHamza = False
.MatchControl = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
The above code replaces the text "test" with "test2" and does a "replace all".
You can turn other options true/false depending on what you need.
The simple way to learn this is to create a macro with actions you want to take, see the generated code & use it in your own example (with/without modified parameters).
EDIT: After looking at some code by Matthew, you could do the following
MSWord.Documents.Open(filename)
Selection = MSWord.Selection
And then translate the above VB code to Python.
Note: The following VB code is shorthand way of assigning property without using the long syntax.
(VB)
With Selection.Find
.Text = "test"
.Replacement.Text = "test2"
End With
Python
find = Selection.Find
find.Text = "test"
find.Replacement.Text = "test2"
Pardon my python knowledge. But, I hope you get the idea to move forward.
Remember to do a Save & Close on Document, after you are done with the find/replace operation.
In the end, you could call MSWord.Quit
(to release Word object from memory).
I like the answers so far;
here's a tested example (slightly modified from here)
that replaces all occurrences of a string in a Word document:
import win32com.client
def search_replace_all(word_file, find_str, replace_str):
''' replace all occurrences of `find_str` w/ `replace_str` in `word_file` '''
wdFindContinue = 1
wdReplaceAll = 2
# Dispatch() attempts to do a GetObject() before creating a new one.
# DispatchEx() just creates a new one.
app = win32com.client.DispatchEx("Word.Application")
app.Visible = 0
app.DisplayAlerts = 0
app.Documents.Open(word_file)
# expression.Execute(FindText, MatchCase, MatchWholeWord,
# MatchWildcards, MatchSoundsLike, MatchAllWordForms, Forward,
# Wrap, Format, ReplaceWith, Replace)
app.Selection.Find.Execute(find_str, False, False, False, False, False, \
True, wdFindContinue, False, replace_str, wdReplaceAll)
app.ActiveDocument.Close(SaveChanges=True)
app.Quit()
f = 'c:/path/to/my/word.doc'
search_replace_all(f, 'string_to_be_replaced', 'replacement_str')
If this mailing list post is right, accessing the document's text is a simple as:
MSWord = win32com.client.Dispatch("Word.Application")
MSWord.Visible = 0
MSWord.Documents.Open(filename)
docText = MSWord.Documents[0].Content
Also see How to: Search for and Replace Text in Documents. The examples use VB and C#, but the basics should apply to Python too.
Checkout this link: http://python.net/crew/pirx/spam7/
The links on the left side point to the documentation.
You can generalize this using the object model, which is found here:
http://msdn.microsoft.com/en-us/library/kw65a0we(VS.80).aspx
You can also achieve this using VBScript. Just type the code into a file named script.vbs
, then open a command prompt (Start -> Run -> Cmd), then switch to the folder where the script is and type:
cscript script.vbs
strFolder = "C:\Files"
Const wdFormatDocument = 0
'Select all files in strFolder
strComputer = "."
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
Set colFiles = objWMIService.ExecQuery _
("ASSOCIATORS OF {Win32_Directory.Name='" & strFolder & "'} Where " _
& "ResultClass = CIM_DataFile")
'Start MS Word
Set objWord = CreateObject("Word.Application")
Const wdReplaceAll = 2
Const wdOrientLandscape = 1
For Each objFile in colFiles
If objFile.Extension = "doc" Then
strFile = strFolder & "\" & objFile.FileName & "." & objFile.Extension
strNewFile = strFolder & "\" & objFile.FileName & ".doc"
Wscript.Echo "Processing " & objFile.Name & "..."
Set objDoc = objWord.Documents.Open(strFile)
objDoc.PageSetup.Orientation = wdOrientLandscape
'Replace text - ^p in a string stands for new paragraph; ^m stands for page break
Set objSelection = objWord.Selection
objSelection.Find.Text = "String to replace"
objSelection.Find.Forward = TRUE
objSelection.Find.Replacement.Text = "New string"
objSelection.Find.Execute ,,,,,,,,,,wdReplaceAll
objDoc.SaveAs strNewFile, wdFormatDocument
objDoc.Close
Wscript.Echo "Ready"
End If
Next
objWord.Quit