可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I need to find the number, the in and out timecode points and all lines of the text.

9
00:09:48,347 --> 00:09:52,818
- Let's see... what else she's got?
- Yea... ha, ha.

10
00:09:56,108 --> 00:09:58,788
What you got down there, missy?

11
00:09:58,830 --> 00:10:00,811
I wouldn't do that!

12
00:10:03,566 --> 00:10:07,047
-Shit, that's not enough!
-Pull her back!

I'm currently using this pattern but it forgets all two lines text

(?<Order>\d+)\r\n(?<StartTime>(\d\d:){2}\d\d,\d{3}) --> (?<EndTime>(\d\d:){2}\d\d,\d{3})\r\n(?<Sub>.+)(?=\r\n\r\n\d+|$)

Any help would be much appreciated.

回答1:

I think there's two problems with the regex. The first is that the . near the end in (?<Sub>.+) is not matching newlines. So you could modify it to:

(?<Sub>(.|[\r\n])+?)

Or you could specify RegexOptions.Singleline as an option to the regex. The only thing the option does is make the dot match newlines.

The second problem is that .+ matches as many lines as it can. You can make it non-greedy like:

(?<Sub>(.|[\r\n])+?(?=\r\n\r\n|$))

This matches the least amount of text that ends with an empty line or the end of the string.

回答2:

If I were you, I'd step back from a regex-based implementation and look at a state machine, walking through the file line by line. Your format looks simple enough to handle with maybe 20-40 lines of easy-to-understand code, but too complex for a reasonable regex.

回答3:

I would personally split the lines into an array and loop through the array examining each line, just doing a regex match for the StartTime->EndTime lines, then you can use some fairly simple logic to grab Order from the previous line, and grab the text from lines following(by searching ahead to find the next StartTime->Endtime and backtracking two lines).

I think this way chops the problem up a little so that you don't have a regex expression trying to do it all.

回答4:

I am using following regular expression to parse .srt files:

@"(?<number>\d+)\r\n(?<start>\S+)\s-->\s(?<end>\S+)\r\n(?<text>(.|[\r\n])+?)\r\n\r\n"

Regular Expression Language - Quick Reference

回答5:

I used this regex in my Ruby parser:

slines.scan(/(^[0-9]+)\r?\n(.*? --> .*?)\r?\n(.*?)(?=^[0-9]+\r?\n|\s+\Z)/im).map{|z| [z[0],[z[1],z[2].strip]]}

where "slines" is the whole subtitle file read into memory.

Parse subtitle file using regex C#

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

收藏的人(0)

Parse subtitle file using regex C#

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮