Flesch-Kincaid readability test in python

2019-08-11 11:30发布

问题:

I need help with this problem I'm having. I need to write a function that returns a FRES (Flesch reading-ease test) from a text. Given the formula:

In other words my task is to turn this formula into a python function.

this is the code from the previous question I had:

import nltk
import collections
nltk.download('punkt')
nltk.download('gutenberg')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('universal_tagset')

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

from itertools import chain
from nltk.corpus import gutenberg
def compute_fres(text):
    """Return the FRES of a text.
    >>> emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
    >>> compute_fres(emma) # doctest: +ELLIPSIS
    99.40...
    """

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename)
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(count_syllables(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
return(score)

And this is the result I get:

Failure
Expected :99.40...

Actual   :92.84866041488623

**********************************************************************
File "C:/Users/PycharmProjects/a1/a1.py", line 60, in a1.compute_fres
Failed example:
    compute_fres(emma) # doctest: +ELLIPSIS
Expected:
    99.40...
Got:
    92.84866041488623

My task is to pass the doctest and result in 99.40... I'm also not allowed the change the following code since it was given to me with the task itself:

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

I feel like I'm getting close but not sure why I get a different result. Any help will be much appreciated.

回答1:

The three num_* variables are all of type int (integer). When you divide integers in most programming languages, you get an integer result, rounded down, for example 14 / 5 produces 2, not 2.8.

Change the calculation to

score = 206.835 - 1.015 * (float(num_words) / num_sents) - 84.6 * (num_syllables / float(num_words))

When one of the operands in a division is a float, the other is also silently converted to a float and (exact) floating-point division is performed. Try float(14)/2.

Additionally, your regular expression VC does not include 'y' among vowels, and does not consider a group of vowels at the end of a word a syllable. Both errors undercount the number of syllables, for example count_syllables("myrtle") will return 0.