反向换行符标记化在每行文件中的一个令牌？ - Unix的(Reverse newline t

如何使用Unix的分离符合令牌？表明文件是tokenizable使用sed或xargs 。

有没有办法做相反？

[在：]

some
sentences
are
like
this.

some
sentences
foo
bar
that

[OUT]：

some sentences are like this.
some sentences foo bar that

每一句唯一的分隔符是\n\n 。我可以做的蟒蛇以下， 但有一个UNIX的办法吗？

def per_section(it):
  """ Read a file and yield sections using empty line as delimiter """
  section = []
  for line in it:
    if line.strip('\n'):
      section.append(line)
    else:
      yield ''.join(section)
      section = []
  # yield any remaining lines as a section too
  if section:
    yield ''.join(section)

print ["".join(i).replace("\n"," ") for i in per_section(codecs.open('outfile.txt','r','utf8'))]

[出：]

[u'some sentences are like this. ', u'some sentences foo bar that ']

Answer 1:

使用AWK是eaiser来处理这样的任务：

awk -v RS="" '{$1=$1}7' file

如果你想保持多个空格在每一行，你可以

awk -v RS="" -F'\n' '{$1=$1}7' file

你的榜样：

kent$  cat f
some
sentences
are
like
this.

some
sentences
foo
bar
that

kent$  awk -v RS=""  '{$1=$1}7' f   
some sentences are like this.
some sentences foo bar that

Answer 2:

你可以做awk命令如下：

awk -v RS="\n\n" '{gsub("\n"," ",$0);print $0}' file.txt

设置记录分离器\n\n这意味着字符串的基团通过一个空行分隔的行中标记化。现在，打印在更换所有的令牌后\n用空格字符。

Answer 3:

sed -n --posix 'H;$ {x;s/\n\([^[:cntrl:]]\{1,\}\)/\1 /gp;}' YourFile

基于空白的线分离，从而，每一个字符串可以在长度上不同也

文章来源: Reverse newline tokenization in one-token per line files? - Unix

反向换行符标记化在每行文件中的一个令牌？ - Unix的(Reverse newline t

Answer 1:

Answer 2:

Answer 3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮