Finding overlapping data in arrays

2019-03-25 12:32发布

问题:

We are writing a C# application that will help to remove unnecessary data repeaters. A repeater can only be removed in the case that all data it receives are received by other repeaters. What we need as a first step is explained bellow:

I have collection of int arrays, for example

a. {1, 2, 3, 4, 5}

b. {2, 4, 6, 7}

c. {1, 3, 5, 8, 11, 100}

It may be thousands of such arrays. I need to find arrays that can be removed. An array can only be removed in the case that all its numbers are included in other arrays. In the example above, array a can be removed because its numbers 2 and 4 are in array b and numbers 1, 3, 5 are in array c.

What the best way to do such operation?

回答1:

This is not optimized solution for minimal number of arrays left.

make the abundance dictionary for the member of arrays. for example:

1 => 2
2 => 2
3 => 2
4 => 2
5 => 2
6 => 1
7 => 1
...

Check each of arrays and if abundance of all members are greater than 1, remove array and reduce the count of each number in your dictionary.



回答2:

Getting the minimum number of remaining arrays (as opposed to a subset of arrays where no more arrays can be removed) is the NP-hard set cover problem. Even with thousands of arrays, however, there's a good chance that, if you apply a mixed integer program solver to the formulation in the linked Wikipedia article, it will be able to find the optimal solution.