How to produce cartesian product in bash?

2020-02-06 06:51发布

问题:

I want to produce such file (cartesian product of [1-3]X[1-5]):

1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5

I can do this using nested loop like:

for i in $(seq 3) 
do
  for j in $(seq 5)
  do
      echo $i $j
  done
done

is there any solution without loops?

回答1:

Combine two brace expansions!

$ printf "%s\n" {1..3}" "{1..5}
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
3 1
3 2
3 3
3 4
3 5

This works by using a single brace expansion:

$ echo {1..5}
1 2 3 4 5

and then combining with another one:

$ echo {1..5}+{a,b,c}
1+a 1+b 1+c 2+a 2+b 2+c 3+a 3+b 3+c 4+a 4+b 4+c 5+a 5+b 5+c


回答2:

A shorter (but hacky) version of Rubens's answer:

join -j 999999 -o 1.1,2.1 file1 file2

Since the field 999999 most likely does not exist it is considered equal for both sets and therefore join have to do the Cartesian product. It uses O(N+M) memory and produces output at 100..200 Mb/sec on my machine.

I don't like the "shell brace expansion" method like echo {1..100}x{1..100} for large datasets because it uses O(N*M) memory and can when used careless bring your machine to knees. It is hard to stop because ctrl+c does not interrupts brace expansion which is done by the shell itself.



回答3:

The best alternative for cartesian product in bash is surely -- as pointed by @fedorqui -- to use parameter expansion. However, in case your input that is not easily producible (i.e., if {1..3} and {1..5} does not suffice), you could simply use join.

For example, if you want to peform the cartesian product of two regular files, say "a.txt" and "b.txt", you could do the following. First, the two files:

$ echo -en {a..c}"\tx\n" | sed 's/^/1\t/' > a.txt
$ cat a.txt
1    a    x
1    b    x
1    c    x

$ echo -en "foo\nbar\n" | sed 's/^/1\t/' > b.txt
$ cat b.txt
1    foo
1    bar

Notice the sed command is used to prepend each line with an identifier. The identifier must be the same for all lines, and for all files, so the join will give you the cartesian product -- instead of putting aside some of the resultant lines. So, the join goes as follows:

$ join -j 1 -t $'\t' a.txt b.txt | cut -d $'\t' -f 2-
a    x    foo
a    x    bar
b    x    foo
b    x    bar
c    x    foo
c    x    bar

After both files are joined, cut is used as an alternative to remove the column of "1"s formerly prepended.



标签: bash seq