I'm trying to split a string on newline characters (catering for Windows, OS X, and Unix text file newline characters). If there are any succession of these, I want to split on that too and not include any in the result.
So, for when splitting the following:
"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix"
The result would be:
['Foo', 'Double Windows', 'Double OS X', 'Double Unix', 'Windows', 'OS X', 'Unix']
What regex should I use?
Paying attention to the greediness rules for patterns:
If there are no spaces at the starts or ends of the lines, you can use
line.split()
with no arguments. It will remove doubles. . If not, you can use[a for a a.split("\r\n") if a]
.EDIT: the
str
type also has a method called "splitlines"."Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix".splitlines()
The simplest pattern for this purpose is
r'[\r\n]+'
which you can pronounce as "one or more carriage-return or newline characters".