BLEU score implementation for sentence similarity

I need to calculate BLEU score for identifying whether two sentences are similar or not.I have read some articles which are mostly about BLEU score for Measuring Machine translation accuracy.But i'm in need of a BLEU score to find out similarity between sentences in a same language[English].(i.e)(Both the sentences are in English).Thanks in anticipation.

标签： java algorithm nlp text-processing machine-translation

5条回答

对你真心纯属浪费

2楼-- · 2019-02-10 17:54

For sentence level comparisons, use smoothed BLEU

The standard BLEU score used for machine translation evaluation (BLEU:4) is only really meaningful at the corpus level, since any sentence that does not have at least one 4-gram match will be given a score of 0.

This happens because, at its core, BLEU is really just the geometric mean of n-gram precisions that is scaled by a brevity penalty to prevent very short sentences with some matching material from being given inappropriately high scores. Since the geometric mean is calculated by multiplying together all the terms to be included in the mean, having a zero for any of the n-gram counts results in the entire score being zero.

If you want to apply BLEU to individual sentences, you're better off using smoothed BLEU (Lin and Och 2004 - see sec. 4), whereby you add 1 to each of the n-gram counts before you calculate the n-gram precisions. This will prevent any of the n-gram precisions from being zero, and thus will result in non-zero values even when there are not any 4-gram matches.

Java Implementation

You'll find a Java implementation of both BLEU and smooth BLEU in the Stanford machine translation package Phrasal.

Alternatives

As Andreas already mentioned, you might want to use an alternative scoring metric such as Levenstein's string edit distance. However, one problem with using the traditional Levenstein string edit distance to compare sentences is that it isn't explicitly aware of word boundaries.

Other alternatives include:

Word Error Rate - This is essentially the Levenstein distance applied to a sequence of words rather than a sequence of characters. It's widely used for scoring speech recognition systems.
Translation Edit Rate (TER) - This is similar to word error rate, but it allows for an additional swap edit operation for adjacent words and phrases. This metric has become popular within the machine translation community since it correlates better with human judgments than other sentence similarity measures such as BLEU. The most recent variant of this metric, known as Translation Edit Rate Plus (TERp), allows for matching of synonyms using WordNet as well as paraphrases of multiword sequences ("died" ~= "kicked the bucket").
METEOR - This metric first calculates an alignment that allows for arbitrary reordering of the words in the two sentences being compared. If there are multiple possible ways to align the sentences, METEOR selects the one that minimizes crisscrossing alignment edges. Like TERp, METEOR allows for matching of WordNet synonyms and paraphrases of multiword sequences. After alignment, the metric computes the similarity between the two sentences using the number of matching words to calculate a F-α score, a balanced measure of precision and recall, which is then scaled by a penalty for the amount of word order scrambling present in the alignment.

0人赞添加讨论(0) 举报

来，给爷笑一个

3楼-- · 2019-02-10 17:58

Maybe the (Levenstein) edit distance is also an option, or the Hamming distance. Either way, the BLEU score is also appropriate for the job; it measures the similarity of one sentence against a reference, so that only makes sense when they're in the same language like with your problem.

0人赞添加讨论(0) 举报

我只想做你的唯一

4楼-- · 2019-02-10 18:01

Well, if you just want to calculate the BLEU score, it's straightforward. Treat one sentence as the reference translation and the other as the candidate translation.

0人赞添加讨论(0) 举报

干净又极端

5楼-- · 2019-02-10 18:11

Here you go: http://code.google.com/p/lingutil/

0人赞添加讨论(0) 举报

等我变得足够好

6楼-- · 2019-02-10 18:12

You can use Moses multi-bleu script, where you can also use multiple references: https://github.com/moses-smt/mosesdecoder/blob/RELEASE-2.1.1/scripts/generic/multi-bleu.perl

0人赞添加讨论(0) 举报

BLEU score implementation for sentence similarity

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间