This question already has an answer here:
String description="Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. البيانات الضخمة هي عبارة عن مجموعة من مجموعة البيانات الضخمة جداً والمعقدة لدرجة أنه يُصبح من الصعب معالجتها باستخدام أداة واحدة فقط من أدوات إدارة قواعد البيانات أو باستخدام تطبيقات معالجة البيانات التقليدية. "
I need a regex to extract only arabic words .
I check this ticket , however , it is a PHP ticket , while , i need JAVA regex .
import java.util.regex.*;
Pattern p = Pattern.compile("#(?:[\x{0600}-\x{06FF}]+(?:\s+[\x{0600}-\x{06FF}]+)*)#u");
print(p.matcher(description).group(1));
It raises an error .
To find one or more Arabic characters you can use
\p{InArabic}+
This class is not mentioned directly by Pattern documentation, but it gives us informations about
and encouraged by example of
\p{InGreek}
we can start reading about blocks, to find thatThat last sentence is most important for us. Now we need to see if
UnicodeBlocks
should support group of Arabic characters. So we visit its documentation where we can find fieldpublic static final Character.UnicodeBlock ARABIC
which suggest that there is support for Arabic characters block.
So to find single Arabic words your code can look like:
output:
If you want to find groups of Arabic words separated by one or more whitespace you can this pattern
You may want to know that
*
- represents zero or more, and+
- one or moreSo this regex means