Converting Readability formula into python functio

2019-03-04 10:38发布

I was given this formula called FRES (Flesch reading-ease test) that is used to measure the readability of a document:

enter image description here

My task is to write a python function that returns the FRES of a text. Hence I need to convert this formula into a python function.

I have re-implemented my code from a answer I got to show what I have so far and the result it has given me:

import nltk
import collections
nltk.download('punkt')
nltk.download('gutenberg')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('universal_tagset')

import re
from itertools import chain
from nltk.corpus import gutenberg
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

def compute_fres(text):
    """Return the FRES of a text.
    >>> emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
    >>> compute_fres(emma) # doctest: +ELLIPSIS
    99.40...
    """

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename)
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(count_syllables(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
return(score)

After running the code this is the result message I got:

Failure

Expected :99.40...

Actual   :92.84866041488623

File "C:/Users/PycharmProjects/a1/a1.py", line 60, in a1.compute_fres
Failed example:
    compute_fres(emma) # doctest: +ELLIPSIS

Expected:
    99.40...
Got:
    92.84866041488623

My function is supposed to pass the doctest and result in 99.40... And I'm also not allowed to edit the syllables function since it came with the task:

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

This question has being very tricky but at least now it's giving me a result instead of an error message, not sure why it's giving me a different result though.

Any help will be very appreciated. Thank you.

1条回答
祖国的老花朵
2楼-- · 2019-03-04 11:27

BTW, there's the textstat library.

from textstat.textstat import textstat
from nltk.corpus import gutenberg

for filename in gutenberg.fileids():
    print(filename, textstat.flesch_reading_ease(filename))

If you're bent on coding up your own, first you've to

  • decide if a punctuation is a word
  • define how to count no. of syllables in the word.

If punctuation is a word and syllables is counted by the regex in your question, then:

import re
from itertools import chain
from nltk.corpus import gutenberg

def num_syllables_per_word(word):
    return len(re.findall('[aeiou]+[^aeiou]+', word))

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename) # i.e. list(chain(*sents))
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(num_syllables_per_word(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
    print(filename, score)
查看更多
登录 后发表回答