Replace multiple newlines with single newlines dur

I have the next code which reads from multiple files, parses obtained lines and prints the result:

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
       pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
   print k

But I have problem with multiple new lines in output:

test1


test2

Instead of it I want to obtain the next result without empty lines in output:

 test1
 test2

and so on.

I tried playing with regexp:

pars.append(re.sub('someword=|\,.*|\#.*|^\n$','',a.read()))

But it doesn't work. Also I tried using strip() and rstrip() including replace. It also doesn't work.

Could you please help?

标签： python regex file

3条回答

Evening l夕情丶

2楼-- · 2020-07-06 09:26

You could use a second regex to replace multiple new lines with a single new line and use strip to get rid of the last new line.

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files/'+str(f), 'r') as a:
        word = re.sub(r'someword=|\,.*|\#.*','', a.read())
        word = re.sub(r'\n+', '\n', word).strip()
        pars.append(word)

for k in pars:
   print k

0人赞添加讨论(0) 举报

一夜七次

3楼-- · 2020-07-06 09:42

Without changing your code much, one easy way would just be to check if the line is empty before you print it, e.g.:

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
        pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
    if not k.strip() == "":
        print k

*** EDIT Since each element in pars is actually the entire content of the file (not just a line), you need to go through an replace any double end lines, easiest to do with re

import os
import re

files=[]
pars=[]

for i in os.listdir('path_to_dir_with_files'):
    files.append(i)

for f in files:
    with open('path_to_dir_with_files'+str(f), 'r') as a:
        pars.append(re.sub('someword=|\,.*|\#.*','',a.read()))

for k in pars:
    k = re.sub(r"\n+", "\n", k)
    if not k.strip() == "":
        print k

Note that this doesn't take care of the case where a file ends with a newline and the next one begins with one - if that's a case you are worried about you need to either add extra logic to deal with it or change the way you're reading the data in

0人赞添加讨论(0) 举报

家丑人穷心不美

4楼-- · 2020-07-06 09:47

Just would like to point out: regexes aren't the best way to handle that. Replacing two empty lines by one in a Python str is quite simple, no need for re:

entire_file = "whatever\nmay\n\nhappen"
entire_file = entire_file.replace("\n\n", "\n")

And voila! Much faster than re and (in my opinion) much easier to read.

0人赞添加讨论(0) 举报

Replace multiple newlines with single newlines dur

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间