Regex to split on successions of newline character

I'm trying to split a string on newline characters (catering for Windows, OS X, and Unix text file newline characters). If there are any succession of these, I want to split on that too and not include any in the result.

So, for when splitting the following:

"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix"

The result would be:

['Foo', 'Double Windows', 'Double OS X', 'Double Unix', 'Windows', 'OS X', 'Unix']

What regex should I use?

标签： python regex python-3.x

5条回答

我欲成王，谁敢阻挡

2楼-- · 2020-06-16 02:14

Paying attention to the greediness rules for patterns:

pattern = re.compile(r'(\r\n){2,}|(\n\r){2,}|(\r){2,}|(\n){2,}')
paragraphs = pattern.split(text)

0人赞添加讨论(0) 举报

够拽才男人

3楼-- · 2020-06-16 02:19

re.split(r'[\n\r]+', line)

0人赞添加讨论(0) 举报

做个烂人

4楼-- · 2020-06-16 02:21

>>> s="Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix"
>>> import re
>>> re.split("[\r\n]+",s)
['Foo', 'Double Windows', 'Double OS X', 'Double Unix', 'Windows', 'OS X', 'Unix']

0人赞添加讨论(0) 举报

男人必须洒脱

5楼-- · 2020-06-16 02:26

If there are no spaces at the starts or ends of the lines, you can use line.split() with no arguments. It will remove doubles. . If not, you can use [a for a a.split("\r\n") if a].

EDIT: the str type also has a method called "splitlines".

"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix".splitlines()

0人赞添加讨论(0) 举报

手持菜刀，她持情操

6楼-- · 2020-06-16 02:30

The simplest pattern for this purpose is r'[\r\n]+' which you can pronounce as "one or more carriage-return or newline characters".

0人赞添加讨论(0) 举报

Regex to split on successions of newline character

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间