This question already has an answer here:
I have a list of the addresses of multiple text files in a dictionary 'd':
'd:/individual-articles/9.txt', 'd:/individual-articles/11.txt', 'd:/individual-articles/12.txt',...
and so on...
Now, I need to read each file in the dictionary and keep a list of the word occurrences of each and every word that occurs in the entire dictionary.
My output should be of the form:
the-500
a-78
in-56
and so on..
where 500 is the number of times the word "the" occurs in all the files in the dictionary..and so on..
I need to do this for all the words.
I am a python newbie..plz help!
My code below doesn't work,it shows no output!There must be a mistake in my logic, please rectify!!
import collections
import itertools
import os
from glob import glob
from collections import Counter
folderpaths='d:/individual-articles'
counter=Counter()
filepaths = glob(os.path.join(folderpaths,'*.txt'))
folderpath='d:/individual-articles/'
# i am creating my dictionary here, can be ignored
d = collections.defaultdict(list)
with open('topics.txt') as f:
for line in f:
value, *keys = line.strip().split('~')
for key in filter(None, keys):
if key=='earn':
d[key].append(folderpath+value+".txt")
for key, value in d.items() :
print(value)
word_count_dict={}
for file in d.values():
with open(file,"r") as f:
words = re.findall(r'\w+', f.read().lower())
counter = counter + Counter(words)
for word in words:
word_count_dict[word].append(counter)
for word, counts in word_count_dict.values():
print(word, counts)
Your code should give you an error in this line:
Because your
word_count_dict
is empty, so when you doword_count_dict[word][file]
you should get a key error, becauseword_count_dict[word]
doesn't exist, so you can do[file]
on it.And I found another error:
This would make file a tuple. But then you do
f = open(file,"r")
, so you assumefile
is a string. This would also raise an error.This means that none of these lines are ever executed. That in turn means that either
while file in d.items():
is empty or forfile in filepaths:
is empty.And to be honest I don't understand why you have both of them. I don't understand what you are trying to achieve there. You have generated a list of filenames to parse. You should just iterate over them. I also don't know why
d
is a dict. All you need is a list of all the files. You don't need to keep track of when key the file came from in the topics, list, do you?Inspired from the
Counter
collection that you use: