I want to read data - like string, from .docx file from C# code. I look through some of the issues but didn't understand which one to use.
I'm trying to use ApplicationClass Application = new ApplicationClass();
but I get t
Error:
The type 'Microsoft.Office.Interop.Word.ApplicationClass' has no
constructors defined
And I want to get full text from my docx file, NOT SEPARATED WORDS !
foreach (FileInfo f in docFiles)
{
Application wo = new Application();
object nullobj = Missing.Value;
object file = f.FullName;
Document doc = wo.Documents.Open(ref file, .... . . ref nullobj);
doc.Activate();
doc. == ??
}
I want to know how can I get whole text from docx file?
try
Word.Application interface instead of ApplicationClass.
Understanding Office Primary Interop Assembly Classes and Interfaces
This Is what I want to extract whole text from docx file !
using (ZipFile zip = ZipFile.Read(filename))
{
MemoryStream stream = new MemoryStream();
zip.Extract(@"word/document.xml", stream);
stream.Seek(0, SeekOrigin.Begin);
XmlDocument xmldoc = new XmlDocument();
xmldoc.Load(stream);
string PlainTextContent = xmldoc.DocumentElement.InnerText;
}
The .docx format as the other Microsoft Office files that end with "x" is simply a ZIP package that you can open/modify/compress.
So use an Office Open XML library like this.
Enjoy.
Make sure you are using .Net Framework 4.5.
using NUnit.Framework;
[TestFixture]
public class GetDocxInnerTextTestFixture
{
private string _inputFilepath = @"../../TestFixtures/TestFiles/input.docx";
[Test]
public void GetDocxInnerText()
{
string documentText = DocxInnerTextReader.GetDocxInnerText(_inputFilepath);
Assert.IsNotNull(documentText);
Assert.IsTrue(documentText.Length > 0);
}
}
using System.IO;
using System.IO.Compression;
using System.Xml;
public static class DocxInnerTextReader
{
public static string GetDocxInnerText(string docxFilepath)
{
string folder = Path.GetDirectoryName(docxFilepath);
string extractionFolder = folder + "\\extraction";
if (Directory.Exists(extractionFolder))
Directory.Delete(extractionFolder, true);
ZipFile.ExtractToDirectory(docxFilepath, extractionFolder);
string xmlFilepath = extractionFolder + "\\word\\document.xml";
var xmldoc = new XmlDocument();
xmldoc.Load(xmlFilepath);
return xmldoc.DocumentElement.InnerText;
}
}