Any ideas for reducing boost::spirit compile time?
I have just ported a flex parser to boost::spirit. The EBNF has about 25 rules.
The result runs well and the runtime performance is fine.
The problem is that compile takes for ever! It takes about ten minutes and requires almost a gigabyte of memory. The original flex parser compiled in a few seconds.
I am using boost version 1.44.0 and Visual Studio 2008.
In Joel de Guzman's article 'Best Practices' it says
Rules with complex definitions hurt the compiler badly. We’ve seen rules that are more than a hundred lines long and take a couple of minutes to compile
Well I do not have anything near that long, but my compile still takes way more than a couple of minutes
Here is the most complex part of my grammar. Is it a candidate for being broken up into smaller parts, in some way?
rule
= ( tok.if_ >> condition >> tok.then_ >> *sequel ) [ bind( &cRuleKit::AddRule, &myRulekit ) ]
| ( tok.if_ >> condition >> tok.and_ >> condition >> tok.then_ >> *sequel ) [ bind( &cRuleKit::AddRule, &myRulekit ) ]
;
condition
= ( tok.identifier >> tok.oper_ >> tok.value ) [ bind( &cRuleKit::AddCondition, &myRulekit, _pass, _1, _2, _3 ) ]
| ( tok.identifier >> tok.between_ >> tok.value >> "," >> tok.value ) [ bind( &cRuleKit::AddConditionBetween, &myRulekit, _pass, _1, _3, _4 ) ]
;
sequel
= ( tok.priority_ >> tok.high_ ) [ bind( &cRuleKit::setPriority, &myRulekit, 3 ) ]
| ( tok.priority_ ) [ bind( &cRuleKit::setPriority, &myRulekit, 2 ) ]
| ( tok.interval_ >> tok.value ) [ bind( &cRuleKit::setInterval, &myRulekit, _2 ) ]
| ( tok.mp3_ >> tok.identifier ) [ bind( &cRuleKit::setMP3, &myRulekit, _2 ) ]
| ( tok.disable_ ) [ bind( &cRuleKit::setNextRuleEnable, &myRulekit, false ) ]
;
By commenting out parts of the grammar, I discovered which part the compiler was spending most time with.
set_reading
= tok.set_reading >> +attribute_reading
;
attribute_reading
= ( tok.name_ >> tok.identifier )
[
bind( &cPackage::Add, &myReadings, _pass, _2 )
]
| ( tok.nmea_ >> tok.identifier )
[
bind( &cPackage::setNextNMEA, &myReadings, _2 )
]
| ( tok.column_ >> tok.integer )
[
bind( &cPackage::setNextColumn, &myReadings, _2 )
]
| ( tok.precision_ >> tok.value )
[
bind( &cPackage::setNextPrecision, &myReadings, _2 )
]
| ( tok.unit_ >> tok.identifier )
[
bind( &cPackage::setNextUnit, &myReadings, _2 )
]
| ( tok.value_ >> tok.identifier )
[
bind( &cPackage::setNextValue, &myReadings, _2 )
]
| ( tok.qualifier_ >> tok.identifier >> tok.qual_col_ >> tok.integer )
[
bind( &cPackage::setNextQualifier, &myReadings, _2, _4 )
]
;
I wouldn't call it complex, but it is certainly the longest rule. So I thought I would try splitting it up, like this:
set_reading
= tok.set_reading >> +attribute_reading
;
attribute_reading
= attribute_reading_name
| attribute_reading_nmea
| attribute_reading_col
| attribute_reading_precision
| attribute_reading_unit
| attribute_reading_value
| attribute_reading_qualifier
;
attribute_reading_name
= ( tok.name_ >> tok.identifier ) [ bind( &cPackage::Add, &myReadings, _pass, _2 ) ]
;
attribute_reading_nmea
= ( tok.nmea_ >> tok.identifier ) [ bind( &cPackage::setNextNMEA, &myReadings, _2 ) ]
;
attribute_reading_col
= ( tok.column_ >> tok.integer ) [ bind( &cPackage::setNextColumn, &myReadings, _2 ) ]
;
attribute_reading_precision
= ( tok.precision_ >> tok.value ) [ bind( &cPackage::setNextPrecision, &myReadings, _2 ) ]
;
attribute_reading_unit
= ( tok.unit_ >> tok.identifier ) [ bind( &cPackage::setNextUnit, &myReadings, _2 ) ]
;
attribute_reading_value
= ( tok.value_ >> tok.identifier ) [ bind( &cPackage::setNextValue, &myReadings, _2 ) ]
;
attribute_reading_qualifier
= ( tok.qualifier_ >> tok.identifier >> tok.qual_col_ >> tok.integer ) [ bind( &cPackage::setNextQualifier, &myReadings, _2, _4 ) ]
;
This saves several minutes from the total compile time!!!
Strangely, the peak memory requirement remains the same, it is just required for less time
So, I am feeling a bit more hopeful that all my efforts in learning boost::spirit are going to be worthwhile.
I do think it is a bit strange that the compiler needs to be so carefully guided in this way. I would have thought a modern compiler would have noticed that this rule was just a list of independent OR rules.
I have spent the best part of seven days learning boost::spirit and porting a small, but real world, parser from flex. My conclusion is that it works and the code is very elegant. Unfortunately, naive use by simply expanding the tutorial example code for a real application quickly overburdens the compiler - memory and time taken for a compile becomes completely impractical. Apparently there are techniques for managing this problem but they require arcane knowledge which I do not have the time to learn. I guess I will stick to flex which may be ugly and old-fashioned but is relatively simple and lightning fast.