RegexpError: Stack overflow in regexp matcher

2019-07-27 13:47发布

I have small problem with a simple tokenizer regex:

def test_tokenizer_regex_limit
   string = '<p>a</p>' * 400
   tokens = string.scan(/(<\s*tag:.*?\/?>)|((?:[^<]|\<(?!\s*tag:.*?\/?>))+)/)
end

Basically it runs through the text and gets pairs of [ matched_tag , other_text ]. Here's an example: http://rubular.com/r/f88JBjfzFh

Works fine for smaller sets. If you run in under ruby 1.8.7 it will blow up. 1.9.2 works fine.

Any ideas how to simplify / improve this? My regex-fu is weak

标签: ruby regex
1条回答
虎瘦雄心在
2楼-- · 2019-07-27 14:15

This is a bit more simplified but not much:

(<[^<]*:[^<]*>)|((?:[^<]|<[^:]*>)+)

(<.*?>|[^<>]+)

查看更多
登录 后发表回答