In PHP, is there a way to detect the language of a string? Suppose the string is in UTF-8 format.
相关问题
- Views base64 encoded blob in HTML with PHP
- Laravel Option Select - Default Issue
- PHP Recursively File Folder Scan Sorted by Modific
- Can php detect if javascript is on or not?
- Using similar_text and strpos together
you can use API of service Lnag ID http://langid.net/identify-language-from-api.html
You can probably use the Google Translate API to detect the language and translate it if necessary.
I would take documents from various languages and reference them against Unicode. You could then use some bayesian reasoning to determine which language it is by the just the unicode characters used. This would seperate French from English or Russian.
I am not sure exactly on what else could be done except lookup the words in language dictionaries to determine the language (using a similar probabilistic approach).
As Google Translate API is going closing down as a free service, you can try this free alternative, which is a replacement for Google Translate API:
http://detectlanguage.com
Text_LanguageDetect pear package produced terrible results: "luxury apartments downtown" is detected as Portuguese...
Google API is still the best solution, they give 300$ free credit and warn before charging you anything
Below is a super simple function that uses file_get_contents to download the lang detected by the API, so no need to download or install libraries etc.
Execute:
You can get your Google Translate API key here: https://console.cloud.google.com/apis/library/translate.googleapis.com/
This is a simple example for short phrases to get you going. For more complex applications you'll want to restrict your API key and use the library obviously.
try to use ascii encode. i use that code to determine ru\en languages in my social bot project