Python finds a string in multiple files recursivel

2019-09-10 06:40发布

I'm learning Python and would like to search for a keyword in multiple files recursively.

I have an example function which should find the *.doc extension in a directory. Then, the function should open each file with that file extension and read it. If a keyword is found while reading the file, the function should identify the file path and print it.

Else, if the keyword is not found, python should continue.

To do that, I have defined a function which takes two arguments:

def find_word(extension, word):
      # define the path for os.walk
      for dname, dirs, files in os.walk('/rootFolder'):
            #search for file name in files:
            for fname in files:
                  #define the path of each file
                  fpath = os.path.join(dname, fname)
                  #open each file and read it
                  with open(fpath) as f:
                        data=f.read()
                  # if data contains the word
                  if word in data:
                        #print the file path of that file  
                        print (fpath)
                  else: 
                        continue

Could you give me a hand to fix this code?

Thanks,

3条回答
Fickle 薄情
2楼-- · 2019-09-10 07:00

If you are trying to read .doc file in your code the this won't work. you will have to change the part where you are reading the file.

Here are some links for reading a .doc file in python.

extracting text from MS word files in python

Reading/Writing MS Word files in Python

Reading/Writing MS Word files in Python

查看更多
成全新的幸福
3楼-- · 2019-09-10 07:08
def find_word(extension, word):
    for root, dirs, files in os.walk('/DOC'):
        # filter files for given extension:
        files = [fi for fi in files if fi.endswith(".{ext}".format(ext=extension))]
        for filename in files:
            path = os.path.join(root, filename)
            # open each file and read it
            with open(path) as f:
                # split() will create list of words and set will
                # create list of unique words 
                words = set(f.read().split())
                if word in words:
                    print(path)
查看更多
老娘就宠你
4楼-- · 2019-09-10 07:11

.doc files are rich text files, i.e. they wont open with a simple text editor or python open method. In this case, you can use other python modules such as python-docx.

Update

For doc files (previous to Word 2007) you can also use other tools such as catdoc or antiword. Try the following.

import subprocess


def doc_to_text(filename):
    return subprocess.Popen(
        'catdoc -w "%s"' % filename,
        shell=True,
        stdout=subprocess.PIPE
    ).stdout.read()

print doc_to_text('fixtures/doc.doc')
查看更多
登录 后发表回答