I'm working on a gettext javascript parser and I'm stuck on the parsing regex.
I need to catch every argument passed to a specific method call _n(
and _(
. For example, if I have these in my javascript files:
_("foo") // want "foo"
_n("bar", "baz", 42); // want "bar", "baz", 42
_n(domain, "bux", var); // want domain, "bux", var
_( "one (optional)" ); // want "one (optional)"
apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples) // could have on the same line two calls..
This refs this documentation: http://poedit.net/trac/wiki/Doc/Keywords
I'm planning in doing it in two times (and two regex):
- catch all function arguments for
_n(
or_(
method calls - catch the stringy ones only
Basically, I'd like a Regex that could say "catch everything after _n(
or _(
and stop at the last parenthesis )
actually when the function is done. I dunno if it is possible with regex and without a javascript parser.
What could also be done is "catch every "string" or 'string' after _n(
or _(
and stop at the end of the line OR at the beginning of a new _n(
or _(
character.
In everything I've done I get either stuck on _( "one (optional)" );
with its inside parenthesis or apples === 0 ? _( "No apples" ) : _n("%1 apple", "%1 apples", apples)
with two calls on the same line.
Here is what I implemented so far, with un-perfect regex: a generic parser and the javascript one or the handlebars one
We can do this in two steps:
1)catch all function arguments for _n( or _( method calls
See demo.
http://regex101.com/r/oE6jJ1/13
2)catch the stringy ones only
See demo.
http://regex101.com/r/oE6jJ1/14
One bit of code (you can test this PHP code at http://writecodeonline.com/php/ to check):
you can see the initial pattern and how it works here: http://regex101.com/r/fR7eU2/1
Output is:
Part 1: match specific functions
Who said that regex can't be modular? Well PCRE regex to the rescue!
The
s
is for matching newlines with.
and thex
modifier is for this fancy spacing and commenting of our regex.Online regex demo Online php demo
Part 2: getting rid of opening & closing brackets
Since our regex will also get the opening and closing brackets
()
, we might need to filter them. We will usepreg_replace()
on the results:Online php demo
Part 3: extracting the arguments
So here's another modular regex, you could even add your own grammar:
We will loop and use
preg_match_all()
. The final code would look like this:Online php demo
You might want to create a recursive function to generate a full tree. See this answer.
You might notice that the
function(){}
part isn't parsed correctly. I will let that as an exercise for the readers :)\(( |"(\\"|[^"])*"|'(\\'|[^'])*'|[^)"'])*?\)
This should get anything between a pair of parenthesis, ignoring parenthesis in quotes. Explanation:
No matter how you slice it, regular expressions are going to cause problems. They're hard to read, hard to maintain, and highly inefficient. I'm unfamiliar with gettext, but perhaps you could use a for loop?
Note: I didn't cover how to determine if it's a function or just a grouping symbol. (ie, this will match
a = (b * c)
). That's complicated, as is covered in detail here. As your code gets more and more accurate, you get closer and closer to writing your own javascript parser. You might want to take a look at the source code for actual javascript parsers if you need that sort of accuracy.Try this:
See live demo
Below regex should help you.
Check the demo here