匹配任何字符（包括新行）在SED(Match any character (including ne

我有，我想在一个巨大的，可怕的，丑陋的HTML是从Microsoft Word文档创建的文件运行sed的命令。它所要做的是去掉字符串的任何实例

style='text-align:center; color:blue;
exampleStyle:exampleValue'

我想修改sed命令是

sed "s/ style='[^']*'//" fileA > fileB

它的伟大工程，但每当有匹配的文本内部的新生产线，它不匹配。有没有我可以做强迫任何字符，包括换行符的匹配修饰符的sed的，还是什么？

据我所知，正则表达式是可怕的XML和HTML，等等等等，但在这种情况下，字符串模式的形成以及在样式属性总是先从一个单引号，并以一个单引号结束。所以，如果我能解决这个问题，换行，我可以超过50％，只用一个命令砍掉HTML的大小。

最终，事实证明，思南Ünür的Perl脚本，效果最好。这几乎是瞬间的，它从2.3 MB减小文件大小850K。好醇”的Perl ...

Answer 1:

sed goes over the input file line by line which means, as I understand, what you want is not possible in sed.

You could use the following Perl script (untested), though:

#!/usr/bin/perl

use strict;
use warnings;

{
    local $/; # slurp mode
    my $html = <>;
    $html =~ s/ style='[^']*'//g;
    print $html;
}

__END__

A one liner would be:

$ perl -e 'local $/; $_ = <>; s/ style=\047[^\047]*\047//g; print' fileA > fileB

Answer 2:

桑达逐行读取输入行，所以它不是简单的事情处理了一个行......但也不是没有可能要么，你需要使用sed的分支。下面的工作，我曾评论它来解释什么是要去（不是最易读的语法！）：

sed "# if the line matches 'style='', then branch to label, 
     # otherwise process next line
     /style='/b style
     b
     # the line contains 'style', try to do a replace
     : style
     s/ style='[^']*'//
     # if the replace worked, then process next line
     t
     # otherwise append the next line to the pattern space and try again.
     N
     b style
 " fileA > fileB

Answer 3:

你可以通过删除所有CR / LF tr ，运行sed ，然后导入到一个编辑器，自动格式。

Answer 4:

另一种方法是这样的：

$ cat toreplace.txt 
I want to make \
this into one line

I also want to \
merge this line

$ sed -e 'N;N;s/\\\n//g;P;D;' toreplace.txt

输出：

I want to make this into one line

I also want to merge this line

所述N加载另一个线， P打印模式空间到第一换行， D删除模式空间直到第一换行符。

Answer 5:

你可以试试这个：

awk '/style/&&/exampleValue/{
    gsub(/style.*exampleValue\047/,"")
}
/style/&&!/exampleValue/{     
    gsub(/style.* /,"")
    f=1        
}
f &&/exampleValue/{  
  gsub(/.*exampleValue\047 /,"")
  f=0
}
1
' file

输出：

# more file
this is a line
    style='text-align:center; color:blue; exampleStyle:exampleValue'
this is a line
blah
blah
style='text-align:center; color:blue;
exampleStyle:exampleValue' blah blah....

# ./test.sh
this is a line

this is a line
blah
blah
blah blah....

文章来源: Match any character (including newlines) in sed