C# or CMD: Remove Word file metadata

2019-08-06 08:25发布

问题:

Is there a way to remove the metadata information of MS Word files or Image files programmatically using C# or a Windows batch command?

The manual way to remove those information is to right click a file in the windows explorer and selecting 'Properties'>'Details'>'Remove Properties and Personal Information'.

回答1:

It ain't easy, at least not to get it all.

You might look at the metadata removal package called Metadact by Litera (formerly Softwise).

There are several others out on the market too.

If you want to do it yourself, first, you'll need to decide on what you consider "metadata".

Some is pretty easy to get to using the Word Object Model (Interop from C# or VB).

Some can't be accessed via Word, so you'll need to use the Structured Storage API to get at it (Like last 10 authors).

If you're talking about DOCX files, you can use the OpenXML SDK to get at all the packages inside the file. then use XML to navigate and edit out the bits you don't want.

Going that way, though, it's MUCH harder to remove "metadata" in the content of the document, because you'll have to deal with internal Word structures like RUNs, and change tracking stuff.



回答2:

Thanks! I think I found way to remove (or add) meta information to office documents. There is a Microsoft article here: The Dsofile.dll files lets you edit Office document properties when you do not have Office installed (KB 224351)

The Dsofile.dll sample file is an in-process ActiveX component for programmers that use Microsoft Visual Basic .NET or the Microsoft .NET Framework. You can use this in your custom applications to read and to edit the OLE document properties that are associated with Microsoft Office files, such as the following:

  • Microsoft Excel workbooks
  • Microsoft PowerPoint presentations
  • Microsoft Word documents Microsoft
  • Project projects Microsoft Visio drawings
  • Other files that are saved in the OLE Structured Storage format

The Dsofile.dll sample file is written in Microsoft Visual C++. The Dsofile.dll sample file demonstrates how to use the OLE32 IPropertyStorage interface to access the extended properties of OLE structured storage files. The component converts the data to Automation friendly data types for easier use by high level programming languages such as Visual Basic 6.0, Visual Basic .NET, and C#. The Dsofile.dll sample file is given with full source code and includes sample clients written in Visual Basic 6.0 and Visual Basic .NET 2003 (7.1).