I've got a little C# application that interops with word converting a bunch of word .doc files into textfiles and for the most part this works fine.
However, if the document is currupt then word cannot open the file and a dialog box pops up, which means that I cannot fully automate this conversion process - someone has to watch for the dialogs.
Is there a way to test if a word .doc is currupt, without opening it? Perhaps through word interop or maybe through a 3rd party tool.
One idea I've had is to spawn a thread that does the conversion and kill it if the process is open for longer than n seconds, but I was wondering if there was a simpler way?
The only sure-fire way to determine whether Word will think that the file is corrupt is to get Word to open it :-). I don't think any 3rd-party application would be 100% reliable in this regard - after all, the document might in fact not be corrupt, but that doesn't help you if Word thinks that it is. However, clearly there are some situations you could detect, such as the file being zero-sized or suchlike.
I don't come across many (any?) corrupt documents, so I do wonder if the corruption you're seeing might follow a pattern that you can detect? For example, are these documents downloaded from somewhere and usually missing the latter part of the file or something?
In any case, a corrupt file is not the only reason that Word might pop up a dialog box. Other reasons include:
- the file is password-protected
- the file contains links to other files
- the file contains macros (which may themselves pop up dialog boxes, or which may cause the security warning dialog to appear)
- etc.
You can circumvent some of these using Application.DisplayAlerts, etc. but not all (especially the security warning).
I've had some success with using a 2nd thread that detects dialogs owned by Office and (for those that it recognizes) presses an appropriate button. It's hardly elegant, but it does work. And yes, my 2nd thread will also terminate the application if it takes too long to perform certain operations too.
Depending on the nature of your application, if it's a server side application without UI interaction, using Office automation may have issues. (see link here: http://support.microsoft.com/kb/257757)
If it's Office 2007+, the best way is to use OpenXML. If it's older files, then some 3rd party tools may be used, for example, aspose API