Does anybody know of an API/SDK or IFilter in .NET that can read the subject ('title' metadata) and text from the following files:
.PDF
.DOC
.XLS
.PPT
.CSV
.TXT
.DOCX
.XLS
.PPTX
+ the OpenOffice and Open Document standards.
Open source would be awesome... but commercial is OK too.
I can't find anything anywhere!
I don't think you will be able to find a single IFilter that will be able to access the contents of all of those types. Typically, an IFilter will be for a specific technology.
For example, Adobe have one for PDFs, Microsoft provide one for Office that can do Word, Excel, Powerpoint, CSV (that I believe comes pre-installed with Windows).