How to remove new lines within double quotes?

2019-04-29 14:51发布

问题:

How can I remove new line inside the " from a file?

For example:

"one", 
"three
four",
"seven"

So I want to remove the \n between the three and four. Should I use regular expression, or I have to read that's file per character with program?

回答1:

To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file

This works by splitting the file along " characters and removing newlines in every other block. With a file containing

"one",
"three
four",
12,
"seven"

this will give the result

"one",
"threefour",
12,
"seven"

Note that it does not handle escape sequences. If strings in the input data can contain \", such as "He said: \"this is a direct quote.\"", then it will not work as desired.



回答2:

You can print those lines starting with ". If they don't, accumulate its content into a variable and print it later on:

$ awk '/^"/ {if (f) print f; f=$0; next} {f=f FS $0} END {print f}' file
"one", 
"three four",
"seven"

Since we are always printing the previous block of text, note the need of END to print the last stored value after processing the full file.



回答3:

You can use sed for that:

sed -r '/^"[^"]+$/{:a;N;/",/!ba;s/\n/ /g}' text

The command searches for lines which start with a doublequote but don't contain another doublequote: /^"[^"]+$/

If such a line is found a label :a is defined to mark the start of a loop. Using the N command we append another line from input to the current buffer. If the new line again doesn't contain the closing double quote /",/! we step again to label a using ba unless we found the closing quote.

If the quote was found all newlines gettting replaces by a space s/\n/ /g and the buffer gets automatically printed by sed.



回答4:

A simplistic solution:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    chomp;
    if (m/^\"/) { print "\n"; }
    print;
}


__DATA__
"one", 
"three
four",
"seven"

But taking the specific case of csv style data, I'd suggest using a perl module called Text::CSV which parses CSV properly - and treats the 'element with a linefeed' part of the preceeding row.

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new( { binary => 1 } );

open( my $input, "<", "input.csv" ) or die $!;

while ( my $row = $csv->getline($input) ) {
    for (@$row) {
        #remove linefeeds in each 'element'. 
        s/\n/ /g;
        #print this specific element ('naked' e.g. without quotes). 
        print;
        print ",";
    }
    print "\n";
}
close($input);


回答5:

tested in a bash

purpose: replace newline inside double quote by \n

works for unix newline (\n), windows newline (\r\n) and mac newline (\n\r)

echo -e '"line1\nline2"'`

line1
line2

echo -e '"line1\nline2"' | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\r?\n\r?/, "\n") } { printf("%s%s", $0, RT) }'

line1\nline2