I'm learning Python and would like to search for a keyword in multiple files recursively.
I have an example function which should find the *.doc
extension in a directory.
Then, the function should open each file with that file extension and read it.
If a keyword is found while reading the file, the function should identify the file path and print it.
Else, if the keyword is not found, python should continue.
To do that, I have defined a function which takes two arguments:
def find_word(extension, word):
# define the path for os.walk
for dname, dirs, files in os.walk('/rootFolder'):
#search for file name in files:
for fname in files:
#define the path of each file
fpath = os.path.join(dname, fname)
#open each file and read it
with open(fpath) as f:
data=f.read()
# if data contains the word
if word in data:
#print the file path of that file
print (fpath)
else:
continue
Could you give me a hand to fix this code?
Thanks,
If you are trying to read .doc file in your code the this won't work. you will have to change the part where you are reading the file.
Here are some links for reading a .doc file in python.
extracting text from MS word files in python
Reading/Writing MS Word files in Python
Reading/Writing MS Word files in Python
.doc
files are rich text files, i.e. they wont open with a simple text editor or python open method. In this case, you can use other python modules such as python-docx.Update
For doc files (previous to Word 2007) you can also use other tools such as catdoc or antiword. Try the following.