Detect language from string in PHP

2019-01-01 01:05发布

In PHP, is there a way to detect the language of a string? Suppose the string is in UTF-8 format.

16条回答
不流泪的眼
2楼-- · 2019-01-01 01:33

You could do this entirely client side with Google's AJAX Language API (now defunct).

With the AJAX Language API, you can translate and detect the language of blocks of text within a webpage using only Javascript. In addition, you can enable transliteration on any textfield or textarea in your web page. For example, if you were transliterating to Hindi, this API will allow users to phonetically spell out Hindi words using English and have them appear in the Hindi script.

You can detect automatically a string's language

var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
  if (!result.error) {
    var language = 'unknown';
    for (l in google.language.Languages) {
      if (google.language.Languages[l] == result.language) {
        language = l;
        break;
      }
    }
    var container = document.getElementById("detection");
    container.innerHTML = text + " is: " + language + "";
  }
});

And translate any string written in one of the supported languages (also defunct)

google.language.translate("Hello world", "en", "es", function(result) {
  if (!result.error) {
    var container = document.getElementById("translation");
    container.innerHTML = result.translation;
  }
});
查看更多
深知你不懂我心
3楼-- · 2019-01-01 01:33

I tried the Text_LanguageDetect library and the results I got were not very good (for instance, the text "test" was identified as Estonian and not English).

I can recommend you try the Yandex Translate API which is FREE for 1 million characters for 24 hours and up to 10 million characters a month. It supports (according to the documentation) over 60 languages.

<?php
function identifyLanguage($text)
{
    $baseUrl = "https://translate.yandex.net/api/v1.5/tr.json/detect?key=YOUR_API_KEY";
    $url = $baseUrl . "&text=" . urlencode($text);

    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_CAINFO, YOUR_CERT_PEM_FILE_LOCATION);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, TRUE);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

    $output = curl_exec($ch);
    if ($output)
    {
        $outputJson = json_decode($output);
        if ($outputJson->code == 200)
        {
            if (strlen($outputJson->lang) > 0)
            {
                return $outputJson->lang;
            }
        }
    }

    return "unknown";
}

function translateText($text, $targetLang)
{
    $baseUrl = "https://translate.yandex.net/api/v1.5/tr.json/translate?key=YOUR_API_KEY";
    $url = $baseUrl . "&text=" . urlencode($text) . "&lang=" . urlencode($targetLang);

    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_CAINFO, YOUR_CERT_PEM_FILE_LOCATION);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, TRUE);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

    $output = curl_exec($ch);
    if ($output)
    {
        $outputJson = json_decode($output);
        if ($outputJson->code == 200)
        {
            if (count($outputJson->text) > 0 && strlen($outputJson->text[0]) > 0)
            {
                return $outputJson->text[0];
            }
        }
    }

    return $text;
}

header("content-type: text/html; charset=UTF-8");

echo identifyLanguage("エクスペリエンス");
echo "<br>";
echo translateText("エクスペリエンス", "en");
echo "<br>";
echo translateText("エクスペリエンス", "es");
echo "<br>";
echo translateText("エクスペリエンス", "zh");
echo "<br>";
echo translateText("エクスペリエンス", "he");
echo "<br>";
echo translateText("エクスペリエンス", "ja");
echo "<br>";
?>
查看更多
残风、尘缘若梦
4楼-- · 2019-01-01 01:35

One approach might be to break the input string into words and then look up those words in an English dictionary to see how many of them are present. This approach has a few limitations:

  • proper nouns may not be handled well
  • spelling errors can disrupt your lookups
  • abbreviations like "lol" or "b4" won't necessarily be in the dictionary
查看更多
回忆,回不去的记忆
5楼-- · 2019-01-01 01:37

You can see how to detect language for a string in php using the Text_LanguageDetect Pear Package or downloading to use it separately like a regular php library.

查看更多
孤独寂梦人
6楼-- · 2019-01-01 01:39

You can not detect the language from the character type. And there are no foolproof ways to do this.

With any method, you're just doing an educated guess. There are available some math related articles out there

查看更多
无与为乐者.
7楼-- · 2019-01-01 01:40

Perhaps submit the string to this language guesser:

http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser

查看更多
登录 后发表回答