Is there a way to remove the metadata information of MS Word files or Image files programmatically using C# or a Windows batch command?
The manual way to remove those information is to right click a file in the windows explorer and selecting 'Properties'>'Details'>'Remove Properties and Personal Information'.
It ain't easy, at least not to get it all.
You might look at the metadata removal package called Metadact by Litera (formerly Softwise).
There are several others out on the market too.
If you want to do it yourself, first, you'll need to decide on what you consider "metadata".
Some is pretty easy to get to using the Word Object Model (Interop from C# or VB).
Some can't be accessed via Word, so you'll need to use the Structured Storage API to get at it (Like last 10 authors).
If you're talking about DOCX files, you can use the OpenXML SDK to get at all the packages inside the file. then use XML to navigate and edit out the bits you don't want.
Going that way, though, it's MUCH harder to remove "metadata" in the content of the document, because you'll have to deal with internal Word structures like RUNs, and change tracking stuff.
Thanks!
I think I found way to remove (or add) meta information to office documents. There is a Microsoft article here: The Dsofile.dll files lets you edit Office document properties when you do not have Office installed (KB 224351)
The Dsofile.dll sample file is an in-process ActiveX component for
programmers that use Microsoft Visual Basic .NET or the Microsoft .NET
Framework. You can use this in your custom applications to read and to
edit the OLE document properties that are associated with Microsoft
Office files, such as the following:
- Microsoft Excel workbooks
- Microsoft PowerPoint presentations
- Microsoft Word documents Microsoft
- Project projects Microsoft Visio drawings
- Other files that are saved in the OLE Structured Storage format
The Dsofile.dll sample file is
written in Microsoft Visual C++. The Dsofile.dll sample file
demonstrates how to use the OLE32 IPropertyStorage interface to access
the extended properties of OLE structured storage files. The component
converts the data to Automation friendly data types for easier use by
high level programming languages such as Visual Basic 6.0, Visual
Basic .NET, and C#. The Dsofile.dll sample file is given with full
source code and includes sample clients written in Visual Basic 6.0
and Visual Basic .NET 2003 (7.1).