Word-level edit distance of a sentence

2楼-- · 2020-02-09 02:01

You can use the same algorithms that are used for finding edit distance in strings to find edit distances in sentences. You can think of a sentence as a string drawn from an alphabet where each character is a word in the English language (assuming that spaces are used to mark where one "character" starts and the next ends). Any standard algorithm for computing edit distance, such as the standard dynamic programming approach for computing Levenshtein distance, can be adapted to solve this problem.

0人赞添加讨论(0) 举报

Lonely孤独者°

3楼-- · 2020-02-09 02:07

The implementation in D is generalized over any range, and thus array. So by splitting your sentences into arrays of strings they can be run through the algorithm and an edit number will be provided.

https://dlang.org/library/std/algorithm/comparison/levenshtein_distance.html

0人赞添加讨论(0) 举报

趁早两清

4楼-- · 2020-02-09 02:07

Here is the Java implementation of edit distance algorithm for sentences using dynamic programming approach.

public class EditDistance {

    public int editDistanceDP(String sentence1, String sentence2) {
        String[] s1 = sentence1.split(" ");
        String[] s2 = sentence2.split(" ");
        int[][] solution = new int[s1.length + 1][s2.length + 1];

        for (int i = 0; i <= s2.length; i++) {
            solution[0][i] = i;
        }

        for (int i = 0; i <= s1.length; i++) {
            solution[i][0] = i;
        }

        int m = s1.length;
        int n = s2.length;
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (s1[i - 1].equals(s2[j - 1]))
                    solution[i][j] = solution[i - 1][j - 1];
                else
                    solution[i][j] = 1
                            + Math.min(solution[i][j - 1], Math.min(solution[i - 1][j], solution[i - 1][j - 1]));
            }
        }
        return solution[s1.length][s2.length];
    }

    public static void main(String[] args) {
        String sentence1 = "first second third";
        String sentence2 = "second";
        EditDistance ed = new EditDistance();
        System.out.println("Edit Distance: " + ed.editDistanceDP(sentence1, sentence2));
    }
}

0人赞添加讨论(0) 举报

家丑人穷心不美

5楼-- · 2020-02-09 02:12

In general, this is called the sequence alignment problem. Actually it does not matter what entities you align - bits, characters, words, or DNA bases - as long as the algorithm works for one type of items it will work for everything else. What matters is whether you want global or local alignment.

Global alignment, which attempt to align every residue in every sequence, is most useful when the sequences are similar and of roughly equal size. A general global alignment technique is the Needleman-Wunsch algorithm algorithm, which is based on dynamic programming. When people talk about Levinstain distance they usually mean global alignment. The algorithm is so straightforward, that several people discovered it independently, and sometimes you may come across Wagner-Fischer algorithm which is essentially the same thing, but is mentioned more often in the context of edit distance between two strings of characters.

Local alignment is more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. The Smith-Waterman algorithm is a general local alignment method also based on dynamic programming. It is quite rarely used in natural language processing, and more often - in bioinformatics.

0人赞添加讨论(0) 举报

乱世女痞

6楼-- · 2020-02-09 02:27

Here is a sample implementation of the @templatetypedef's idea in ActionScript (it worked great for me), which calculates the normalized Levenshtein distance (or in other words gives a value in the range [0..1])

  private function nlevenshtein(s1:String, s2:String):Number {
     var tokens1:Array = s1.split(" ");
     var tokens2:Array = s2.split(" ");
     const len1:uint = tokens1.length, len2:uint = tokens2.length;
     var d:Vector.<Vector.<uint> >=new Vector.<Vector.<uint> >(len1+1);
     for(i=0; i<=len1; ++i)
        d[i] = new Vector.<uint>(len2+1);

     d[0][0]=0;

     var i:int;
     var j:int;

     for(i=1; i<=len1; ++i) d[i][0]=i; 
     for(i=1; i<=len2; ++i) d[0][i]=i;

     for(i = 1; i <= len1; ++i)
        for(j = 1; j <= len2; ++j)
           d[i][j] = Math.min( Math.min(d[i - 1][j] + 1,d[i][j - 1] + 1),
              d[i - 1][j - 1] + (tokens1[i - 1] == tokens2[j - 1] ? 0 : 1) );

     var nlevenshteinDist:Number = (d[len1][len2]) / (Math.max(len1, len2));

     return nlevenshteinDist;
  }

I hope this will help!

0人赞添加讨论(0) 举报

Word-level edit distance of a sentence

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间