-->

Merging two partial (jointly overdetermined) sets

2020-03-04 02:54发布

问题:

I have a web app with data in a grid. The user can reorder the columns, and the server can change which columns exist. I would like to save the user's column order in a cookie and restore it on page load.

More formally, I have two arrays of unique IDs (strings) called user_columns and server_columns. I would like to reorder server_columns such that I respect all of the ordering information from user_columns, and as much from server_columns as possible. How do I do this? What's a reasonable formal definition of "as much as possible"?

My analysis so far:

One aspect of the problem is trivial: if the server removes some columns, delete the corresponding entries from user_columns. Any information about the ordering of columns which are no longer there is moot. The problem then becomes one of merging two potentially conflicting sets of ordering information.

This corresponds to a family of problems from voting theory: given a set of ballots, each of which contains a partial order between the candidates, produce a complete ordering of the candidates which in some sense reflects the ballots.

This leads me to think that I might get a workable solution by applying e.g. the Schulze Method or Ranked Pairs to a sufficiently rigged set of ballots based on user_columns and server_columns. For UX reasons, breaking ties by inserting new columns last (to the right) seems like a good idea to me.

Does this sound like it's on the right track?

Note also that we can consider three kinds of comparisons: A and B are both in user_columns, one of them is, or none of them are. The former and latter kinds are easily resolved (refer to user_columns and server_columns, respectively); the one in the middle, and its interactions with the latter, are the tricky parts.

回答1:

Let's say we have C columns, numbered from 1 to C. We have two sequences sequences of columns, U = u1, u2, ... un and S = s1, s2, ... sm. We want to find a permutation P of S, such that P has no inversions with regard to U and a minimal number of inversions with regard to S.

We can show that there is such an optimal P which is an interleaving of U ∩ S and S \ U. By "interleaving" I mean that P has no inversions with regard to U ∩ S or S \ U.

We can apply dynamic programming to find the optimal interleaving: Let A = (ai) = U ∩ S and B = (bj) = S \ U. Let f(i,j) be the number of inversions w.r.t. S of the optimal interleaving of the prefixes a1...i of A and b1...j of B. The idea is very similar to the longest common subsequence DP algorithm. We have the recurrence

f(0,j) = 0 for all j >= 0
f(i,0) = f(i-1, 0) + sum(k=1 to i-1, [1 if A[i] appears before A[k] in S])
f(i,j) = min(f(i-1, j) + sum(k=1 to i-1, [1 if A[i] appears before A[k] in S])
                       + sum(k=1 to j, [1 if A[i] appears before B[k] in S]),
             f(i, j-1) + sum(k=1 to i, [1 if B[j] appears before A[k] in S])
                       + sum(k=1 to j-1, [1 if B[j] appears before B[k] in S]))

I used the notation [1 if X] here to denote the value 1, if X is true and 0, if X is false.

The matrix f can be built in time O(|A|^2 * |B|^2). The minimal cost (number of inversions w.r.t. S) will be f(|A|, |B|).

We can reconstruct the optimal permutation using the DP matrix as well: We build it from back to front. We start with the tuple (i,j) = (|A|, |B|) and at every step depending on which of the two options is minimum in the DP transition, we know whether we need to put A[i] or B[j] to the front of the permutation. Then we proceed with (i-1, j) or (i, j-1) depending on which we choose.

Here is an implementation of the algorithm, please excuse my lack of JS skills.



回答2:

One definition that might be reasonable is to minimize the number of inversions relative to the server's order, subject to the constraint that the restriction to the columns in common of the two orders be equal. I don't know offhand an algorithm to minimize this objective.



回答3:

This relates to Niklas B's answer:

Theorem: Consider a sequence S = s₁, …, sₙ of some orderet set (e.g. integers). If i < j and sᵢ > sⱼ, then swapping sᵢ and sⱼ decreases the number of inversions—that is, let S' = s₁, …, sᵢ₋₁, sⱼ, sᵢ₊₁, …, sⱼ₋₁, sᵢ, sⱼ₊₁, …, sₙ; then S' has fewer inversions than S.

Intuitively stated: if two elements are out of order and you swap them, you're closer to having a sorted list.

Proof: Observe that the only elements that have a different relative order in S and S' are (sᵢ, sⱼ), (sᵢ, sₖ) and (sⱼ, sₖ) for each k where i < k < j. We know that (sᵢ, sⱼ) is an inversion in S but not S', so consider sₖ for some such k.

Either sₖ < sⱼ < sᵢ or sⱼ < sₖ < sᵢ or sⱼ < sᵢ < sₖ (we assume the elements of S to be unique).

In the first case, (sᵢ, sₖ) is an inversion in S and (sⱼ, sₖ) is an inversion in S'. In the second case, (sᵢ, sₖ) and (sⱼ, sₖ) are inversions in S but not in S'. In the third case, (sⱼ, sₖ) is an inversion in S and (sᵢ, sₖ). These are all the changes in inversions.

In each case, the number of inversions in S' is either the same as that in S or it is smaller. Recall that (sᵢ, sⱼ) got fixed from S to S', and we get the desired result. ■

Thus, if we have a₁, bᵢ, …, bⱼ, a₂ with each aS \ U and each bUS and a₁ > a₂ and we swap a₁ and a₂, getting a₂, bᵢ, …, bⱼ, a₁, the inversion count is lower. Since such swaps only rearrange elements of S \ U and not those of US, any solution which has zero inversions on US and (subject to that) a minimal number of inversions on S \ U must make all such swaps.

Ergo: the elements of S \ U must occur in order, and thus the solution is an interleaving of US and S \ U.