Reading metadata with Python

2019-02-19 09:28发布

For the past two days I have been scanning the Internet to try to find the solution to my problem. I have a folder of different files. They run the gambit of file types. I am trying to write a python script that will read the metadata from each file, if it exists. The intent is to eventually output the data to a file to compare with another program's metadata extraction.

I have found some examples where it worked for a very few number of the files in the directory. All the ways I have found have dealt with opening a Storage Container object. I am new to Python and am not sure what a Storage Container object is. I just know that most of my files error out when trying to use

pythoncom.StgOpenStorage(<File Name>, None, flags)

With the few that actually work, I am able to get the main metadata tags, like Title, Subject, Author, Created, etc.

Does anyone know a way other than Storage Containers to get to the metadata? Also, if there is an easier way to do this with another language, by all means, suggest it.

Thanks

2条回答
该账号已被封号
2楼-- · 2019-02-19 09:59

The problem is that there are two ways that Windows stores file metadata. The approach you're using is suitable for files created by COM applications; this data is included inside the file itself. However, with the introduction of NTFS5, any file can contain metadata as part of an alternate data stream. So it's possible the files that succeed are COM-app created ones, and the ones that are failing aren't.

Here's a possibly more robust way of dealing with the COM-app created files: Get document summary information from any file.

With alternate data streams, it's possible to read them directly:

meta = open('myfile.ext:StreamName').read()

Update: okay, now I see none of this is relevant because you were after document metadata and not file metadata. What a difference clarity in a question can make :|

Try this: How to retrieve author of a office file in python?

查看更多
可以哭但决不认输i
3楼-- · 2019-02-19 10:08

You can use the Shell com objects to retrieve any metadata visible in Explorer:

import win32com.client
sh=win32com.client.gencache.EnsureDispatch('Shell.Application',0)
ns = sh.NameSpace(r'm:\music\Aerosmith\Classics Live!')
colnum = 0
columns = []
while True:
    colname=ns.GetDetailsOf(None, colnum)
    if not colname:
        break
    columns.append(colname)
    colnum += 1

for item in ns.Items():
    print (item.Path)
    for colnum in range(len(columns)):
        colval=ns.GetDetailsOf(item, colnum)
        if colval:
            print('\t', columns[colnum], colval)
查看更多
登录 后发表回答