How can I parse an Arabic Umm Al-Qura date string

2019-02-22 02:46发布

问题:

I have the following Arabic date in the Umm Al-Qura calendar that I want to parse into a .NET DateTime object:

الأربعاء‏، 17‏ ذو الحجة‏، 1436

This date is equivalent to September 30th 2015 in the Gregorian calendar.

I've been trying the following "standard" C# code to parse this date, but without success:

var cultureInfo = new CultureInfo("ar-SA");
cultureInfo.DateTimeFormat.Calendar = new UmAlQuraCalendar(); // the default one anyway

var dateFormat = "dddd، dd MMMM، yyyy"; //note the ، instead of ,

var dateString = "‏الأربعاء‏، 17‏ ذو الحجة‏، 1436";
DateTime date;
DateTime.TryParseExact(dateString, dateFormat, cultureInfo.DateTimeFormat, DateTimeStyles.AllowWhiteSpaces, out date);

No matter what I do, the result of TryParseExact is always false. How do I parse this string properly in .NET?

By the way, if I start from a DateTime object, I can create the exact date string above using ToString()'s overloads on DateTime without problems. I just can't do it the other way around apparently.

回答1:

Your datestring is 30 characters long and contains four UNICODE 8207 U+200F RIGHT TO LEFT MARK characters, but your dateformat does not.

// This gives a string 26 characters long
var str = new DateTime(2015,9,30).ToString(dateFormat, cultureInfo.DateTimeFormat)

RIGHT TO LEFT MARK is not whitespace.

If it only contains RLM/LRM/ALM you should probably just strip them out. Same with the isolates LRI/RLI/FSI and PDI sets, and LRE/RLE sets. You may not want to do that with LRO though. LRO is often used with legacy data where the RTL characters are stored in the opposite order, i.e. in the left-to-right order. In these cases you may want to actually reverse the characters.

Parsing dates from random places is a hard problem. You need a layered solution, try first one method, then another in priority order until you succeed. There is no 100% solution though, because people can type what they like.

See here for more information: http://www.unicode.org/reports/tr9/



回答2:

This is a Right-To-Left culture, which means that the year will be rendered first. For example, the following code:

var cultureInfo = new CultureInfo("ar-SA");
cultureInfo.DateTimeFormat.Calendar = new UmAlQuraCalendar(); 
Console.WriteLine(String.Format(cultureInfo,"{0:dddd، dd MMMM، yyyy}",DateTime.Now));

produces الأربعاء، 17 ذو الحجة، 1436. Parsing this string works without problem:

var dateString="الأربعاء، 17 ذو الحجة، 1436";
var result=DateTime.TryParseExact(dateString, dateFormat, cultureInfo.DateTimeFormat,
                                  DateTimeStyles.AllowWhiteSpaces,out date);
Debug.Assert(result);

PS: I don't know how to write the format string to parse the original input, as changing the position of what looks like a comma to me, changes the actual characters rendered in the string.