I am trying to convert the plain text Arabic Numerals into Eastern Arabic digits. So basically taking 1 2 3... and converting them into ١ ٢ ٣.... The function converts all numbers, including any numbers contained within tags, i.e. H1
.
private void LoadHtmlFile(object sender, EventArgs e)
{
var htmlfile = "<html><body><h1>i was born in 1988</h1></body></html>".ToArabicNumber(); ;
webBrowser1.DocumentText=htmlfile;
}
}
public static class StringHelper
{
public static string ToArabicNumber(this string str)
{
if (string.IsNullOrEmpty(str)) return "";
char[] chars;
chars = str.ToCharArray();
for (int i = 0; i < str.Length; i++)
{
if (str[i] >= '0' && str[i] <= '9')
{
chars[i] += (char)1728;
}
}
return new string(chars);
}
}
I also tried targeting only numbers in InnerText, but it also did not work. The code below changes tag numbers as well.
private void LoadHtmlFile(object sender, EventArgs e)
{
var htmlfile = "<html><body><h1>i was born in 1988</h1></body></html>" ;
webBrowser1.DocumentText=htmlfile;
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
webBrowser1.Document.Body.InnerText = webBrowser1.Document.Body.InnerText.ToArabicNumber();
}
Any suggestions?
You can use a regular expression to find the parts of the HTML that are between '>' and '<' characters, and operate on those. This will prevent the code from processing the tag names and attributes (style, etc).
// Convert all English digits in a string to Arabic digit equivalents
public static string ToArabicNums(string src)
{
const string digits = "۰۱۲۳۴۵۶۷۸۹";
return string.Join("",
src.Select(c => c >= '0' && c <= '9' ? digits[((int)c - (int)'0')] : c)
);
}
// Convert all English digits in the text segments of an HTML
// document to Arabic digit equivalents
public static string ToArabicNumsHtml(string src)
{
string res = src;
Regex re = new Regex(@">(.*?)<");
// get Regex matches
MatchCollection matches = re.Matches(res);
// process in reverse in case transformation function returns
// a string of a different length
for (int i = matches.Count - 1; i >= 0; --i)
{
Match nxt = matches[i];
if (nxt.Groups.Count == 2 && nxt.Groups[1].Length > 0)
{
Group g = nxt.Groups[1];
res = res.Substring(0, g.Index) + ToArabicNums(g.Value) +
res.Substring(g.Index + g.Length);
}
return res;
}
This isn't perfect, since it doesn't check at all for HTML character specifiers outside of the tags, such as the construct &#<digits>;
(۱
for ۱, etc)to specify a character by Unicode value, and will replace the digits in these. It also won't process any extra text before the first tag or after the last tag.
Sample:
Calling: ToArabicNumsHtml("<html><body><h1>I was born in 1988</h1></body></html>")
Result: "<html><body><h1>I was born in ۱۹۸۸</h1></body></html>"
Use whatever code you prefer in ToArabicNums
to do the actual transformation, or generalize it by passing in a transformation function.
Use regular expressions. Here is the JavaScript code I myself use:
function toIndic(n) {
var ns = ['۰', '۱', '۲', '۳', '۴', '۵', '۶', '۷', '۸', '۹'];
return n.toString().replace(/\d/g, function (m) {
return ns[m];
});
}
To make sure, you only convert numbers, you can use a better regular expression: \b[0-9]+\b
This function can convert English to Persian , Arabic and ordu
function convertDigitIn(enDigit){ // PERSIAN, ARABIC, URDO
var newValue="";
for (var i=0;i<enDigit.length;i++)
{
var ch=enDigit.charCodeAt(i);
if (ch>=48 && ch<=57
{
// european digit range
var newChar=ch+1584;
newValue=newValue+String.fromCharCode(newChar);
}
else
newValue=newValue+String.fromCharCode(ch);
}
return newValue;
}
Just add this at the end of your document, it will works fine :-)
<script type="text/javascript">
$(document).ready(function() {
var map = ["&\#1632;","&\#1633;","&\#1634;","&\#1635;","&\#1636;","&\#1637;","&\#1638;","&\#1639;","&\#1640;","&\#1641;"]
document.body.innerHTML = document.body.innerHTML.replace(
/\d(?=[^<>]*(<|$))/g,
function($0) { return map[$0] }
);
});
</script>