Thinking about my other problem, i decided I can't even create a regular expression that will match roman numerals (let alone a context-free grammar that will generate them)
The problem is matching only valid roman numerals. Eg, 990 is NOT "XM", it's "CMXC"
My problem in making the regex for this is that in order to allow or not allow certain characters, I need to look back. Let's take thousands and hundreds, for example.
I can allow M{0,2}C?M (to allow for 900, 1000, 1900, 2000, 2900 and 3000). However, If the match is on CM, I can't allow following characters to be C or D (because I'm already at 900).
How can I express this in a regex?
If it's simply not expressible in a regex, is it expressible in a context-free grammar?
Fortunately, the range of numbers is limited to 1..3999 or thereabouts. Therefore, you can build up the regex piece-meal.
Each of those parts will deal with the vagaries of Roman notation. For example, using Perl notation:
Repeat and assemble.
Added: The
<opt-hundreds-part>
can be compressed further:Since the 'D?C{0,3}' clause can match nothing, there's no need for the question mark. And, most likely, the parentheses should be the non-capturing type - in Perl:
Of course, it should all be case-insensitive, too.
You can also extend this to deal with the options mentioned by James Curran (to allow XM or IM for 990 or 999, and CCCC for 400, etc).
Steven Levithan uses this regex in his post which validates roman numerals prior to "deromanizing" the value:
For people who really want to understand the logic, please take a look at a step by step explanation on 3 pages on diveintopython.
The only difference from original solution (which had
M{0,4}
) is because I found that 'MMMM' is not a valid Roman numeral (also old Romans most probably have not thought about that huge number and will disagree with me). If you are one of disagreing old Romans, please forgive me and use {0,4} version.The problem of the solution from Jeremy and Pax is, that it does also match "nothing".
The following regex expects at least one roman numeral:
I would write functions to my work for me. Here are two roman numeral functions in PowerShell.