How to detect language of user entered text? [clos

2020-01-24 03:54发布

I am dealing with an application that is accepting user input in different languages (currently 3 languages fixed). The requirement is that users can enter text and dont bother to select the language via a provided checkbox in the UI.

Is there an existing Java library to detect the language of a text?

I want something like this:

text = "To be or not to be thats the question."

// returns ISO 639 Alpha-2 code
language = detect(text);

print(language);

result:

EN

I dont want to know how to create a language detector by myself (i have seen plenty of blogs trying to do that). The library should provide a simple APi and also work completely offline. Open-source or commercial closed doesn't matter.

i also found this questions on SO (and a few more):

How to detect language
How to detect language of text?

7条回答
时光不老,我们不散
2楼-- · 2020-01-24 04:33
Just a working code from already available solution from cybozu labs:

package com.et.generate;

import java.util.ArrayList;
import com.cybozu.labs.langdetect.Detector;
import com.cybozu.labs.langdetect.DetectorFactory;
import com.cybozu.labs.langdetect.LangDetectException;
import com.cybozu.labs.langdetect.Language;

public class LanguageCodeDetection {

    public void init(String profileDirectory) throws LangDetectException {
        DetectorFactory.loadProfile(profileDirectory);
    }
    public String detect(String text) throws LangDetectException {
        Detector detector = DetectorFactory.create();
        detector.append(text);
        return detector.detect();
    }
    public ArrayList<Language> detectLangs(String text) throws LangDetectException {
        Detector detector = DetectorFactory.create();
        detector.append(text);
        return detector.getProbabilities();
    }
    public static void main(String args[]) {
        try {
            LanguageCodeDetection ld = new  LanguageCodeDetection();

            String profileDirectory = "C:/profiles/";
            ld.init(profileDirectory);
            String text = "Кремль россий";
            System.out.println(ld.detectLangs(text));
            System.out.println(ld.detect(text));
        } catch (LangDetectException e) {
            e.printStackTrace();
        }
    }

}

Output:
[ru:0.9999983255911719]
ru

Profiles can be downloaded from: https://language-detection.googlecode.com/files/langdetect-09-13-2011.zip

查看更多
登录 后发表回答