Given two directory trees how to find which filena

2019-08-23 12:21发布

This answer tells me how to find the files with the same filename in two directories in bash:

diff -srq dir1/ dir2/ | grep identical

Now I want to consider files which satisfy a condition. If I use ls E*, I get back files starting with E. I want to do the same with the above command: give me the filenames which are different in dir1/ and dir2/, but consider only those starting with E.

I tried the following:

diff -srq dir1/E* dir2/E* | grep identical

but it did not work, I got this output:

diff: extra operand '/home/pal/konkoly/c6/elesbe3/1/EPIC_212291374- c06-k2sc.dat.flag.spline' diff: Try 'diff --help' for more information.

((/home/pal/konkoly/c6/elesbe3/1/EPIC_212291374- c06-k2sc.dat.flag.spline is a file in the so-called dir1, but EPIC_212291374- c06-k2sc.dat.flag.spline is not in the so-called dir2))

How can I solve this?


I tried doing it in the following way, based on this answer:

DIR1=$(ls dir1)
DIR2=$(ls dir2)

for i in $DIR1; do
    for j in $DIR2; do
        if [[ $i == $j ]]; then
            echo "$i == $j"
        fi
    done
done

It works as above, but if I write DIR1=$(ls path1/E*) and DIR2=$(ls path2/E*), it does not, I get no output.

2条回答
Summer. ? 凉城
2楼-- · 2019-08-23 12:41

The accepted answer works fine. Though if someone needs a python implementation, this also works:

import glob

dir1withpath=glob.glob("path/to/dir1/E*")
dir2withpath=glob.glob("path/to/dir2/E*")

dir1=[]
for index,each in enumerate(dir1withpath):
    dir1list=dir1withpath[index].split("/")
    dir1.append(dir1list[-1])

dir2=[]
for index,each in enumerate(dir2withpath):
    dir2list=dir2withpath[index].split("/")
    dir2.append(dir2list[-1])

for each1 in dir1:
    for each2 in dir2:
        if each1 == each2:
            print(each1 + "is in both directories")
查看更多
可以哭但决不认输i
3楼-- · 2019-08-23 12:48

This is untested, but I'd try something like:

comm -12 <(cd dir1 && ls E*) <(cd dir2 && ls E*)

Basic idea:

  • Generate a list of filenames in dir1 that satisfy our condition. This can be done with ls E* because we're only dealing with a flat list of files. For subdirectories and recursion we'd use find instead (e.g. find . -name 'E*' -type f).

  • Put the filenames in a canonical order (e.g. by sorting them). We don't have to do anything here because E* expands in sorted order anyway. With find we might have to pipe the output into sort first.

  • Do the same thing to dir2.

  • Only output lines that are common to both lists, which can be done with comm -12.

    comm expects to be passed two filenames on the command line, so we use the <( ... ) bash feature to spawn a subprocess and connect its output to a named pipe; the name of the pipe can then be given to comm.

查看更多
登录 后发表回答