RegexpError: Stack overflow in regexp matcher

2019-07-27 13:47发布

I have small problem with a simple tokenizer regex:

def test_tokenizer_regex_limit
   string = '<p>a</p>' * 400
   tokens = string.scan(/(<\s*tag:.*?\/?>)|((?:[^<]|\<(?!\s*tag:.*?\/?>))+)/)
end

Basically it runs through the text and gets pairs of [ matched_tag , other_text ]. Here's an example: http://rubular.com/r/f88JBjfzFh

Works fine for smaller sets. If you run in under ruby 1.8.7 it will blow up. 1.9.2 works fine.

Any ideas how to simplify / improve this? My regex-fu is weak

标签： ruby regex

1条回答

2楼-- · 2019-07-27 14:15

This is a bit more simplified but not much:

(<[^<]*:[^<]*>)|((?:[^<]|<[^:]*>)+)

~~(<.*?>|[^<>]+)~~

0人赞添加讨论(0) 举报