For instance, given the following strings:
let textEN = "The quick brown fox jumps over the lazy dog"
let textES = "El zorro marrón rápido salta sobre el perro perezoso"
let textAR = "الثعلب البني السريع يقفز فوق الكلب الكسول"
let textDE = "Der schnelle braune Fuchs springt über den faulen Hund"
I want to detect the used language in each of them.
Let's assume the signature for the implemented function is:
func detectedLanguage<T: StringProtocol>(_ forString: T) -> String?
returns an Optional string in case of no detected language.
thus the appropriate result would be:
let englishDetectedLanguage = detectedLanguage(textEN) // => English
let spanishDetectedLanguage = detectedLanguage(textES) // => Spanish
let arabicDetectedLanguage = detectedLanguage(textAR) // => Arabic
let germanDetectedLanguage = detectedLanguage(textDE) // => German
Is there an easy approach to achieve it?
I tried
NSLinguisticTagger
with short input text likehello
, it always recognizes as Italian. Luckily, Apple recently added NLLanguageRecognizer available on iOS 12, and seems like it more accurate :DLatest versions (iOS 12+)
Briefly:
You could achieve it by using NLLanguageRecognizer, as:
Older versions (iOS 11+)
Briefly:
You could achieve it by using NSLinguisticTagger, as:
Details:
First of all, you should be aware of what are you asking about is mainly relates to the world of Natural language processing (NLP).
Since NLP is more than text language detection, the rest of the answer will not contains specific NLP information.
Obviously, implementing such a functionality is not that easy, especially when starting to care about the details of the process such as splitting into sentences and even into words, after that recognising names and punctuations etc... I bet you would think of "what a painful process! it is not even logical to do it by myself"; Fortunately, iOS does supports NLP (actually, NLP APIs are available for all Apple platforms, not only the iOS) to make what are you aiming for to be easy to be implemented. The core component that you would work with is
NSLinguisticTagger
:As mentioned in the class documentation, the method that you are looking for - under Determining the Dominant Language and Orthography section- is
dominantLanguage(for:)
:You might notice that the
NSLinguisticTagger
is exist since back to iOS 5. However,dominantLanguage(for:)
method is only supported for iOS 11 and above, that's because it has been developed on top of the Core ML Framework:Based on the returned value from calling
dominantLanguage(for:)
by passing "The quick brown fox jumps over the lazy dog":would be "en" optional string. However, so far that is not the desired output, the expectation is to get "English" instead! Well, that is exactly what you should get by calling the
localizedString(forLanguageCode:)
method from Locale Structure and passing the gotten language code:Putting all together:
As mentioned in the "Quick Answer" code snippet, the function would be:
Output:
It would be as expected:
Note That:
There still cases for not getting a language name for a given string, like:
Or it could be even
nil
:Still find it a not bad result for providing a useful output...
Furthermore:
About NSLinguisticTagger:
Although I will not going to dive deep in
NSLinguisticTagger
usage, I would like to note that there are couple of really cool features exist in it more than just simply detecting the language for a given a text; As a pretty simple example: using the lemma when enumerating tags would be so helpful when working with Information retrieval, since you would be able to recognize the word "driving" passing "drive" word.Official Resources
Apple Video Sessions:
NSLinguisticTagger
works: Natural Language Processing and your Apps.Also, for getting familiar with the CoreML:
You can use NSLinguisticTagger's tagAt method. It support iOS 5 and later.