I have imported nltk in python to calculate BLEU Score on Ubuntu. I understand how sentence-level BLEU score works, but I don't understand how corpus-level BLEU score work.
Below is my code for corpus-level BLEU score:
import nltk
hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1])
print(BLEUscore)
For some reason, the bleu score is 0 for the above code. I was expecting a corpus-level BLEU score of at least 0.5.
Here is my code for sentence-level BLEU score
import nltk
hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = [1])
print(BLEUscore)
Here the sentence-level BLEU score is 0.71 which I expect, taking into account the brevity-penalty and the missing word "a". However, I don't understand how corpus-level BLEU score work.
Any help would be appreciated.
Let's take a look:
You're in a better position than me to understand the description of the algorithm, so I won't try to "explain" it to you. If the docstring does not clear things up enough, take a look at the source itself. Or find it locally:
TL;DR:
(Note: You have to pull the latest version of NLTK on the
develop
branch in order to get a stable version of the BLEU score implementation)In Long:
Actually, if there's only one reference and one hypothesis in your whole corpus, both
corpus_bleu()
andsentence_bleu()
should return the same value as shown in the example above.In the code, we see that
sentence_bleu
is actually a duck-type ofcorpus_bleu
:And if we look at the parameters for
sentence_bleu
:The input for
sentence_bleu
's references is alist(list(str))
.So if you have a sentence string, e.g.
"This is a cat"
, you have to tokenized it to get a list of strings,["This", "is", "a", "cat"]
and since it allows for multiple references, it has to be a list of list of string, e.g. if you have a second reference, "This is a feline", your input tosentence_bleu()
would be:When it comes to
corpus_bleu()
list_of_references parameter, it's basically a list of whatever thesentence_bleu()
takes as references:Other than look at the doctest within the
nltk/translate/bleu_score.py
, you can also take a look at the unittest atnltk/test/unit/translate/test_bleu_score.py
to see how to use each of the component within thebleu_score.py
.By the way, since the
sentence_bleu
is imported asbleu
in the (nltk.translate.__init__.py
](https://github.com/nltk/nltk/blob/develop/nltk/translate/init.py#L21), usingwould be the same as:
and in code: