How to remove common lines between two files witho

I have two files not sortered which have some lines in common.

file1.txt

Z
B
A
H
L

file2.txt

S
L
W
Q
A

The way I'm using to remove common lines is the following:

sort -u file1.txt > file1_sorted.txt
sort -u file2.txt > file2_sorted.txt

comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt

Output:

B
H
Z

The problem is that I want to keep the order of file1.txt, I mean:

Desired output:

Z
B
H

One solution I tought is doing a loop to read all the lines of file2.txt and:

sed -i '/^${line_file2}$/d' file1.txt

But if files are big the performance may suck.

Do you like my idea?
Do you have any alternative to do it?

标签： bash sorting optimization sed comm

3条回答

闹够了就滚

2楼-- · 2020-02-27 11:43

You can use just grep (-v for invert, -f for file). Grep lines from input1 that do not match any line in input2:

grep -vf input2 input1

Gives:

Z
B
H

0人赞添加讨论(0) 举报

仙女界的扛把子

3楼-- · 2020-02-27 11:44

I've written a little Perl script that I use for this kind of thing. It can do more than what you ask for but it can also do what you need:

#!/usr/bin/env perl -w
use strict;
use Getopt::Std;
my %opts;
getopts('hvfcmdk:', \%opts);
my $missing=$opts{m}||undef;
my $column=$opts{k}||undef;
my $common=$opts{c}||undef;
my $verbose=$opts{v}||undef;
my $fast=$opts{f}||undef;
my $dupes=$opts{d}||undef;
$missing=1 unless $common || $dupes;;
&usage() unless $ARGV[1];
&usage() if $opts{h};
my (%found,%k,%fields);
if ($column) {
    die("The -k option only works in fast (-f) mode\n") unless $fast;
    $column--; ## So I don't need to count from 0
}

open(my $F1,"$ARGV[0]")||die("Cannot open $ARGV[0]: $!\n");
while(<$F1>){
    chomp;
    if ($fast){ 
    my @aa=split(/\s+/,$_);
    $k{$aa[0]}++;   
        $found{$aa[0]}++;
    }
    else {
    $k{$_}++;   
        $found{$_}++;
    }
}
close($F1);
my $n=0;
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
my $size=0;
if($verbose){
    while(<F2>){
        $size++;
    }
}
close(F2);
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");

while(<F2>){
    next if /^\s+$/;
    $n++;
    chomp;
    print STDERR "." if $verbose && $n % 10==0;
    print STDERR "[$n of $size lines]\n" if $verbose && $n % 800==0;
    if($fast){
        my @aa=split(/\s+/,$_);
        $k{$aa[0]}++ if defined($k{$aa[0]});
        $fields{$aa[0]}=\@aa if $column;
    }
    else{
        my @keys=keys(%k);
        foreach my $key(keys(%found)){
            if (/\Q$key/){
            $k{$key}++ ;
            $found{$key}=undef unless $dupes;
            }
        }
    }
}
close(F2);
print STDERR "[$n of $size lines]\n" if $verbose;

if ($column) {
    $missing && do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" unless $k{$_}>1}keys(%k);
    $common &&  do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>1}keys(%k);
    $dupes &&   do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>2}keys(%k);
}
else {
    $missing && do map{print "$_\n" unless $k{$_}>1}keys(%k);
    $common &&  do map{print "$_\n" if $k{$_}>1}keys(%k);
    $dupes &&   do map{print "$_\n" if $k{$_}>2}keys(%k);
}
sub usage{
    print STDERR <<EndOfHelp;

  USAGE: compare_lists.pl FILE1 FILE2

      This script will compare FILE1 and FILE2, searching for the 
      contents of FILE1 in FILE2 (and NOT vice versa). FILE one must 
      be one search pattern per line, the search pattern need only be 
      contained within one of the lines of FILE2.

    OPTIONS: 
      -c : Print patterns COMMON to both files
      -f : Search only the first characters of each line of FILE2
      for the search pattern given in FILE1
      -d : Print duplicate entries     
      -m : Print patterns MISSING in FILE2 (default)
      -h : Print this help and exit
EndOfHelp
      exit(0);
}

In your case, you would run it as

list_compare.pl -cf file1.txt file2.txt

The -f option makes it compare only the first word (defined by whitespace) of file2 and greatly speeds things up. To compare the entire line, remove the -f.

0人赞添加讨论(0) 举报

迷人小祖宗

4楼-- · 2020-02-27 11:52

grep or awk:

awk 'NR==FNR{a[$0]=1;next}!a[$0]' file2 file1

0人赞添加讨论(0) 举报

How to remove common lines between two files witho

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间