Is it possible to get text (line or sentence) from a given line number in MS Word using office automation? I mean its ok if I can get either the text in the given line number or the sentence(s) itself which is a part of that line.
I am not providing any code because I have absolutely no clue how an MS Word is read using office automation. I can go about opening the file like this:
var wordApp = new ApplicationClass();
wordApp.Visible = false;
object file = path;
object misValue= Type.Missing;
Word.Document doc = wordApp.Documents.Open(ref file, ref misValue, ref misValue,
ref misValue, ref misValue, ref misValue,
ref misValue, ref misValue, ref misValue,
ref misValue, ref misValue, ref misValue);
//and rest of the code given I have a line number = 3 ?
Edit: To clarify @Richard Marskell - Drackir's doubt, though text in MS Word is a long chain of string, office automation does still let us know line number. In fact I get the line number itself from another piece of code, like this:
Word.Revision rev = //SomeRevision
object lineNo = rev.Range.get_Information(Word.WdInformation.wdFirstCharacterLineNumber);
For instance say the Word file looks like this:
fix grammatical or spelling errors
clarify meaning without changing it correct minor mistakes add related resources or links
always respect the original author
Here there are 4 lines.
Fortunately after some epic searching I got a solution.
object file = Path.GetDirectoryName(Application.ExecutablePath) + @"\Answer.doc";
Word.Application wordObject = new Word.ApplicationClass();
wordObject.Visible = false;
object nullobject = Missing.Value;
Word.Document docs = wordObject.Documents.Open
(ref file, ref nullobject, ref nullobject, ref nullobject,
ref nullobject, ref nullobject, ref nullobject, ref nullobject,
ref nullobject, ref nullobject, ref nullobject, ref nullobject,
ref nullobject, ref nullobject, ref nullobject, ref nullobject);
String strLine;
bool bolEOF = false;
docs.Characters[1].Select();
int index = 0;
do
{
object unit = Word.WdUnits.wdLine;
object count = 1;
wordObject.Selection.MoveEnd(ref unit, ref count);
strLine = wordObject.Selection.Text;
richTextBox1.Text += ++index + " - " + strLine + "\r\n"; //for our understanding
object direction = Word.WdCollapseDirection.wdCollapseEnd;
wordObject.Selection.Collapse(ref direction);
if (wordObject.Selection.Bookmarks.Exists(@"\EndOfDoc"))
bolEOF = true;
} while (!bolEOF);
docs.Close(ref nullobject, ref nullobject, ref nullobject);
wordObject.Quit(ref nullobject, ref nullobject, ref nullobject);
docs = null;
wordObject = null;
Here's the genius behind the code. Follow the link for some more explanation on how it works.
Use this if you want to read standard text .txt files
Here is something that you can use to read the files with one call
List<string> strmsWord =
new List<string>(File.ReadAllLines(yourFilePath+ YourwordDocName));
if you want to loop thru and see what the items that were returned use something like this
foreach (string strLines in strmsWord )
{
Console.WriteLine(strLines);
}
or
I totally forgot about something Word docs are probably in binary format so look at this and read the contents into a RichTextBox and from there you could either get at the line number you want or load it into a list after words.. this link will show you
Reading from a Word Doc
if you want to read the XML Formatting of the word Document:
here is a good link as to checkout as well
ReadXML Format of a Word Document
This onne is an even easier example reads contents into the ClipBoard
Load Word into ClipBoard
var word = new Word.Application();
object miss = Missing.Value;
object path = @"D:\viewstate.docx";
object readOnly = true;
var docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss,
ref miss, ref miss, ref miss, ref miss, ref miss,
ref miss, ref miss, ref miss, ref miss, ref miss,
ref miss, ref miss);
string totaltext = "";
object unit = Word.WdUnits.wdLine;
object count = 1;
word.Selection.MoveEnd(ref unit, ref count);
totaltext = word.Selection.Text;
TextBox1.Text = totaltext;
docs.Close(ref miss, ref miss, ref miss);
word.Quit(ref miss, ref miss, ref miss);
docs = null;
word = null;