I'm stumped about a "stack overflow" error--"out of stack space (application error code: 12246)--that I'm getting in BBEdit when I do a "replace all", searching for
(@article(((?!eprint|@article|@book).)*\r)*)pmid = {(.+)}((((?!eprint|@article|@book).)*\r)*(@|\r*\z))
and replacing with
\1eprinttype = {pubmed}, eprint = {\4}\5
I can use these same patterns manually, doing one-at-a-time find & replace, without any errors, even once the match no longer occurs. I can also avoid the error by working on smaller files.
I suspect that it's my inefficient and sloppy regex coding that's to blame, and would appreciate an expert's help in doing this more efficiently. I'm trying to locate all entries in a BibLaTeX bibliography that don't already have an eprint
field, but which have a pmid
field, and replace the pmid
field with a corresponding e-print specification (using eprint
and eprinttype
).
Update: After some experimentation, I've found that a different approach is the only thing I can get to work. Searching for
(?(?=@article(.+\r)+eprint = {(.+\r)+}\r*)(?!)|(@article(.+\r)+)pmid = {(.+)}((.+\r)+}\r*))
and replacing with
\3eprinttype = {pubmed}, eprint = {\5}\6
does the trick. The only problem with this is the backreferences are fragile, but I can't get named backreferences to work in BBEdit.
Obviously this is some bug. But you could try changing the expression a bit. It's difficult to optimize the expression without knowing the requirements, but here's a guess:
Replace with:
BBEdit seems to use PCRE, unless it's (very) outdated the above expression should be compatible.
It's probably catastrophic backtracking caused by this last part:
If you break that down and simplify it, you essentially have a
.*
, a\r*
, and another\r*
right next to each other. Now picture a string of\r
characters at the end of your input: How should each\r
be distributed? Which of those little clauses will soak up each\r
character? If you have\r\r\r\r\r
, you could eat all five\r
s with the.*
part and none at all with the\r*
parts...or, you can make up any number of permutations that will still match. Since the*
is greedy, it will try to fill the.*
up first, but if that fails, it has to keep trying permutations until one of them works. So it's probably hogging a bunch of your resources with unnecessary backtracking, until finally it crashes.I'm not an expert on optimization techniques for regex, but I'd start there if I were you.
Update:
Check out the Wikipedia article on PCRE:
So I think catastrophic backtracking is a good bet here. I'd try to solve it by tweaking your regex before changing the build options on PCRE.