Bash Directory Sorting Issue - Removing Duplicate

2019-09-10 22:10发布

问题:

I'm using this command to merge multiple identical directories and to remove duplicate lines from each of the corresponding files:

for f in app1/*; do 
   bn="$(basename "$f")"
   sort -u "$f" "app2/$bn" > "app/$bn"
done

Is there a way to edit this so that it checks the lines of all the files and removes all the duplicates as well? I do need to keep the existing file structure with individual files.

The end result creates a directory with 300 text files that's no larger than 30mb.

Example:

**Directory app1**
*1.txt*       
a
b
c

*2.txt*
d
e
f

**Directory app2**
*1.txt*
a
b
c
g

*2.txt*
a
b
c
d
e
f

**Results in Directory app**
*1.txt*
a
b
c
g

*2.txt*
a
b
c
d
e
f

Desired Result in Directory app Should Be:
*1.txt*
a
b
c
g

*2.txt*
d
e
f

As you can see it's not removing the duplicate "A B C" lines from 2.txt when it's also found in 1.txt. All lines in each file should remain unique and all duplicates should be removed.

回答1:

This should probably be done with perl -i:

perl -i -n -e 'print unless $h{$_};++$h{$_}' app1/*

This seems to create .bak files in app1 (despite man page saying it won't) which you may want to eliminate after verifying the result with rm app1/*.bak.



回答2:

As you can see it's not removing the duplicate "A B C" lines from 2.txt when it's also found in 1.txt. All lines in each file should remain unique and all duplicates should be removed.

You can accomplish this goal by applying 7171u's answer to your other question "Unix Bash Remove Duplicate Lines From Directory Files?" to the result of your command above (after having changed the tmp/* in his script to app/*, which should be trivial).