Keyword Extraction in Python_RAKE

2019-02-20 14:06发布

问题:

I am a novice user and puzzled over the following otherwise simple "loop" problem. I have a local dir with x number of files (about 500 .txt files). I would like to extract the corresponding keywords from each unique file using RAKE for Python. I've reviewed the documentation for RAKE; however, the suggested code in the tutorial gets keywords for a single document. Can someone please explain to me how to loop over an X number of files stored in my local dir. Here's the code from the tutorial and it words really well for a single document.

$git clone https://github.com/zelandiya/RAKE-tutorial

import rake
import operator

rake_object = rake.Rake("SmartStoplist.txt", 5, 3, 4)

sample_file = open("data/docs/fao_test/w2167e.txt", 'r')
text = sample_file.read()
keywords = rake_object.run(text)
print "Keywords:", keywords

回答1:

Create a list of filenames you want to process:

filenames = [
    'data/docs/fao_test/w2167e.txt',
    'some/other/folder/filename.txt',
    etc...
]

If you don't want to hardcode all the names, you can use the glob module to collect filenames by wildcards.

Create a dictionary for storing the results:

results = {}

Loop through each filename, reading the contents and storing the Rake results in the dictionary, keyed by filename:

for filename in filenames:
    with open(filename, 'r') as fp:
        results[filename] = rake_object.run(fp.read())