I'm trying to open .doc file and read its content. But i can't find any way how to do this without launching MSWord.
Now I have following code:
Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
object nullObject = System.Reflection.Missing.Value;
object file = @"C:\doc.doc";
Microsoft.Office.Interop.Word.Document doc = app.Documents.Open(ref file, ref nullObject, ref nullObject,
ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject,
ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject, ref nullObject,
ref nullObject);
doc.ActiveWindow.Selection.WholeStory();
doc.ActiveWindow.Selection.Copy();
IDataObject data = Clipboard.GetDataObject();
string text = data.GetData(DataFormats.Text).ToString();
doc.Close(ref nullObject, ref nullObject, ref nullObject);
app.Quit(ref nullObject, ref nullObject, ref nullObject);
But it launches MSWord, any solution to do it without launching?
Last time I did this (via COM from C++), I recall a 'Visible' property in the Application interface (true=visible).
However, it seems to me that the default was false, so you had to set it to true to make Word appear.
Regardless of whether or not the user can see Word, you will still see winword.exe (or whatever it's called today) in your task manager. I don't think there's a way to access Word through this interface without it launching Word (behind the scenes or not).
If you don't want Word to launch at all, you may have to find another solution.
Two possibilities: either use Microsoft's spec to write your own parser for the .doc format, or use an existing library for the purpose (e.g., from Aspose). Unless you have a couple of spare years to spend on the task, the latter is clearly the correct choice.
Add the Namespace using Add Reference-->Browse-->Code7248.word_reader.dll
Download dll from the given URL :
sourceforge.net/p/word-reader/wiki/Home
(A simple .NET Library compatible with .NET 2.0, 3.0, 3.5 and 4.0 for C#. It can currently extract only the raw text from a .doc or .docx file.)
The Sample Code is in simple Console in C#:
It is working fine.