How to separate tokens in line using Unix? showed that a file is tokenizable using sed
or xargs
.
Is there a way to do the reverse?
[in:]
some
sentences
are
like
this.
some
sentences
foo
bar
that
[out]:
some sentences are like this.
some sentences foo bar that
The only delimiter per sentence is the \n\n
. I could have done the following in python, but is there a unix way?
def per_section(it):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line.strip('\n'):
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)
print ["".join(i).replace("\n"," ") for i in per_section(codecs.open('outfile.txt','r','utf8'))]
[out:]
[u'some sentences are like this. ', u'some sentences foo bar that ']