Replace Text in Word document using Open Xml

2020-02-21 07:54发布

I have created a docx file from a word template, now I am accessing the copied docx file and want to replace certain text with some other data.

I am unable to get the hint as to how to access the text from the doument main part?

Any help would be appreciable.

Below is my code till now.

private void CreateSampleWordDocument()
    {
        //string sourceFile = Path.Combine("D:\\GeneralLetter.dot");
        //string destinationFile = Path.Combine("D:\\New.doc");
        string sourceFile = Path.Combine("D:\\GeneralWelcomeLetter.docx");
        string destinationFile = Path.Combine("D:\\New.docx");
        try
        {
            // Create a copy of the template file and open the copy
            File.Copy(sourceFile, destinationFile, true);
            using (WordprocessingDocument document = WordprocessingDocument.Open(destinationFile, true))
            {
                // Change the document type to Document
                document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
                //Get the Main Part of the document
                MainDocumentPart mainPart = document.MainDocumentPart;
                mainPart.Document.Save();
            }
        }
        catch
        {
        }
    }

Now how to find certain text and replace the same? I am unable to get via Link, so some code hint would be appreciable.

7条回答
小情绪 Triste *
2楼-- · 2020-02-21 08:31
Dim doc As WordprocessingDocument = WordprocessingDocument.Open("Chemin", True, New OpenSettings With {.AutoSave = True})

Dim d As Document = doc.MainDocumentPart.Document

Dim txt As Text = d.Descendants(Of Text).Where(Function(t) t.Text = "txtNom").FirstOrDefault

If txt IsNot Nothing Then
 txt.Text = txt.Text.Replace("txtNom", "YASSINE OULARBI")
End If

doc.Close()
查看更多
倾城 Initia
3楼-- · 2020-02-21 08:32

In addition to Flowerking's answer:

When your Word file has textboxes in it, his solution would not work. Because textbox has TextBoxContent element so it will not appear at foreach loop of Runs.

But when writing

using ( WordprocessingDocument doc =
                    WordprocessingDocument.Open(@"yourpath\testdocument.docx", true))
{
    var document = doc.MainDocumentPart.Document

    foreach (var text in document.Descendants<Text>()) // <<< Here
    {
        if (text.Text.Contains("text-to-replace"))
        {
            text.Text = text.Text.Replace("text-to-replace", "replaced-text");
        }
    } 
}

it will loop all the texts in document(whether it is in textbox or not) so it will replace the texts.

Note that if the text is split between Runs or Textboxes, this also won't work. You need a better solution for those cases.

查看更多
我想做一个坏孩纸
4楼-- · 2020-02-21 08:39

Maybe this solution is easier:
1. a StreamReader reads all the text,
2. using a Regex you case-insensitively replace the new text instead of the old tex
3. a StreamWriter writes again the modified text into the document.

 using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
    string docText = null;
    using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        docText = sr.ReadToEnd();

    foreach (var t in findesReplaces)
        docText = new Regex(findText, RegexOptions.IgnoreCase).Replace(docText, replaceText);

    using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        sw.Write(docText);
}
查看更多
爷、活的狠高调
5楼-- · 2020-02-21 08:49

Just to give you the idea of how to do it, please try:

  using ( WordprocessingDocument doc =
                    WordprocessingDocument.Open(@"yourpath\testdocument.docx", true))
            {
                var body = doc.MainDocumentPart.Document.Body;
                var paras = body.Elements<Paragraph>();

                foreach (var para in paras)
                {
                    foreach (var run in para.Elements<Run>())
                    {
                        foreach (var text in run.Elements<Text>())
                        {
                            if (text.Text.Contains("text-to-replace"))
                            {
                                text.Text = text.Text.Replace("text-to-replace", "replaced-text");
                            }
                        }
                    }
                }
            }
        }

Please note the text is case sensitive. The text formatting won't be changed after the replace. Hope this helps you.

查看更多
Bombasti
6楼-- · 2020-02-21 08:49

Here is a solution that can find and replace tags in an open xml (word) document across text runs (including text boxes)

namespace Demo
{
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text.RegularExpressions;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;

    public class WordDocumentHelper
    {
        class DocumentTag
        {
            public DocumentTag()
            {
                ReplacementText = "";
            }

            public string Tag { get; set; }
            public string Table { get; set; }
            public string Column { get; set; }
            public string ReplacementText { get; set; }

            public override string ToString()
            {
                return ReplacementText ?? (Tag ?? "");
            }
        }

        private const string TAG_PATTERN = @"\[(.*?)[\.|\:](.*?)\]";
        private const string TAG_START = @"[";
        private const string TAG_END = @"]";

        /// <summary>
        /// Clones a document template into the temp folder and returns the newly created clone temp filename and path.
        /// </summary>
        /// <param name="templatePath"></param>
        /// <returns></returns>
        public string CloneTemplateForEditing(string templatePath)
        {
            var tempFile = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName()) + Path.GetExtension(templatePath);
            File.Copy(templatePath, tempFile);
            return tempFile;
        }

        /// <summary>
        /// Opens a given filename, replaces tags, and saves. 
        /// </summary>
        /// <param name="filename"></param>
        /// <returns>Number of tags found</returns>
        public int FindAndReplaceTags(string filename)
        {
            var allTags = new List<DocumentTag>();

            using (WordprocessingDocument doc = WordprocessingDocument.Open(path: filename, isEditable: true))
            {
                var document = doc.MainDocumentPart.Document;

                // text may be split across multiple text runs so keep a collection of text objects
                List<Text> tagParts = new List<Text>();

                foreach (var text in document.Descendants<Text>())
                {
                    // search for any fully formed tags in this text run
                    var fullTags = GetTags(text.Text);

                    // replace values for fully formed tags
                    fullTags.ForEach(t => {
                        t = GetTagReplacementValue(t);
                        text.Text = text.Text.Replace(t.Tag, t.ReplacementText);
                        allTags.Add(t);
                    });

                    // continue working on current partial tag
                    if (tagParts.Count > 0)
                    {
                        // working on a tag
                        var joinText = string.Join("", tagParts.Select(x => x.Text)) + text.Text;

                        // see if tag ends with this block
                        if (joinText.Contains(TAG_END))
                        {
                            var joinTag = GetTags(joinText).FirstOrDefault(); // should be just one tag (or none)
                            if (joinTag == null)
                            {
                                throw new Exception($"Misformed document tag in block '{string.Join("", tagParts.Select(x => x.Text)) + text.Text}' ");
                            }

                            joinTag = GetTagReplacementValue(joinTag);
                            allTags.Add(joinTag);

                            // replace first text run in the tagParts set with the replacement value. 
                            // (This means the formatting used on the first character of the tag will be used)
                            var firstRun = tagParts.First();
                            firstRun.Text = firstRun.Text.Substring(0, firstRun.Text.LastIndexOf(TAG_START));
                            firstRun.Text += joinTag.ReplacementText;

                            // replace trailing text runs with empty strings
                            tagParts.Skip(1).ToList().ForEach(x => x.Text = "");

                            // replace all text up to and including the first index of TAG_END
                            text.Text = text.Text.Substring(text.Text.IndexOf(TAG_END) + 1);

                            // empty the tagParts list so we can start on a new tag
                            tagParts.Clear();
                        }
                        else
                        {
                            // no tag end so keep getting text runs
                            tagParts.Add(text);
                        }
                    }

                    // search for new partial tags
                    if (text.Text.Contains("["))
                    {
                        if (tagParts.Any())
                        {
                            throw new Exception($"Misformed document tag in block '{string.Join("", tagParts.Select(x => x.Text)) + text.Text}' ");
                        }
                        tagParts.Add(text);
                        continue;
                    }

                }

                // save the temp doc before closing
                doc.Save();
            }

            return allTags.Count;
        }

        /// <summary>
        /// Gets a unique set of document tags found in the passed fileText using Regex
        /// </summary>
        /// <param name="fileText"></param>
        /// <returns></returns>
        private List<DocumentTag> GetTags(string fileText)
        {
            List<DocumentTag> tags = new List<DocumentTag>();

            if (string.IsNullOrWhiteSpace(fileText))
            {
                return tags;
            }

            // TODO: custom regex for tag matching 
            // this example looks for tags in the formation "[table.column]" or "[table:column]" and captures the full tag, "table", and "column" into match Groups
            MatchCollection matches = Regex.Matches(fileText, TAG_PATTERN);
            foreach (Match match in matches)
            {
                try
                {

                    if (match.Groups.Count < 3
                        || string.IsNullOrWhiteSpace(match.Groups[0].Value)
                        || string.IsNullOrWhiteSpace(match.Groups[1].Value)
                        || string.IsNullOrWhiteSpace(match.Groups[2].Value))
                    {
                        continue;
                    }

                    tags.Add(new DocumentTag
                    {
                        Tag = match.Groups[0].Value,
                        Table = match.Groups[1].Value,
                        Column = match.Groups[2].Value
                    });
                }
                catch
                {

                }
            }

            return tags;
        }

        /// <summary>
        /// Set the Tag replacement value of the pasted tag
        /// </summary>
        /// <returns></returns>
        private DocumentTag GetTagReplacementValue(DocumentTag tag)
        {
            // TODO: custom routine to update tag Replacement Value

            tag.ReplacementText = "foobar";

            return tag;
        }
    }
}
查看更多
Melony?
7楼-- · 2020-02-21 08:51

here is solution from msdn.

Example from there:

public static void SearchAndReplace(string document)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}
查看更多
登录 后发表回答