I am trying to detect Arabic characters in a webpage's HTML using Notepad++ CTRL+F with regular expressions. I am entering the following as my search terms and it is returning all characters.
[\u0600-\u06FF]
Sample block of random text I'm working with -
awr4tgagas
بqa4tq4twْq4tw4twtfwd
awfasfrw34جَ4tw4tg
دِيَّة عَرqaw4trawfَبِيَّ
Any ideas why this Regular Expression won't detect the Arabic characters properly and how I should go about this? I have the document encoded as UTF-8.
Thanks!
This is happening because Notepadd++ regex engine is PCRE which doesn't support the syntax you have provided.
To match a unicode codepoint you have to use
\x{NNNN}
so your regular expression becomes:Because Notepad++'s implementation of Regular Expressions requires that you use the
notation to match Unicode characters.
In your example,
can be used to match the
ب
(bāʾ,bet,beth,vet) character.The
\u
symbol is used to match uppercase letters.See http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions#Ranges_or_kinds_of_characters
for an explanation of Notepad++'s regex syntax.