Minimum repetitions in merged array of characters

2020-07-23 09:02发布

问题:

Suppose I have two arrays and I want to merge them so that the merged array has the minimum amount of repetitions. For example [ 'x', 'x' ] is a repetition.

arr1 = [ 'x', 'd', 'd', 'm', 'f', 'm' ]
arr2 = [ 'd', 'd', 'x', 'f', 'f', 'm' ]

The only condition is that in the merged array, the elements from arr1 and arr2 must appear in their respective orders within arr1 and arr2. Below is an example of the merged array with 0 repetitions while maintaining this condition.

merged = [ 'd', 'x', 'd', 'x', 'd', 'f', 'd', 'm', 'f', 'm', 'f', 'm' ]

I'm trying to relate this problem to popular dynamic programming problems to help me out. Are there any similar problems out there that I should look into?

回答1:

I define the following function: F(d, i, j) = the minimum number of repetitions possible from a string made of a prefix of arr1 of length i and prefix of arr2 of length j, followed by a ith (d=0) or jth (d=1) symbol from arr[d]. Thus F(d, i, j) corresponds to a string of length i+j+1.

If you are familiar with the way Levenshtein distance is calculated, think of it as instead of assigning scores to the vertices of the grid we assign scores to the edges, where d signifies whether it's the horizontal or the vertical edge. This gives us a one-symbol 'memory', so we can detect repetitions.

The following C++ code calculates the minimum number of repetitions and prints a corresponding string in quadratic time:

#include <string.h>
#include <stdio.h>
#include <algorithm>
#include <limits.h>

char A[32], B[32], C[64];
int score[2][32][32];

void print_result(int d, int i, int j)
{
    char c = d ? B[j] : A[i];
    int s0 = i > 0 ? score[0][i-1][j] + (A[i-1] == c) : INT_MAX;
    int s1 = j > 0 ? score[1][i][j-1] + (B[j-1] == c) : INT_MAX;

    if(s0 <= s1 && i > 0)
        print_result(0, i-1, j);
    else if(j > 0)
        print_result(1, i, j-1);

    printf("%c", c);
}

void print_result(int i, int j)
{
    if(score[0][i-1][j] < score[1][i][j-1])
        print_result(0, i-1, j);
    else
        print_result(1, i, j-1);
}

int main()
{
    fgets(A, sizeof(A), stdin);
    fgets(B, sizeof(B), stdin);

    int m = strlen(A) - 1; // -1 to remove LF
    int n = strlen(B) - 1;

    for(int j = 0; j <= n; ++j)
    {
        for(int i = 0; i <= m; ++i)
        {
            score[0][i][j] = !i && !j ? 0 : std::min(
                i > 0 ? score[0][i-1][j] + (A[i-1] == A[i]) : INT_MAX,
                j > 0 ? score[1][i][j-1] + (B[j-1] == A[i]) : INT_MAX
            );
            score[1][i][j] = !i && !j ? 0 : std::min(
                i > 0 ? score[0][i-1][j] + (A[i-1] == B[j]) : INT_MAX,
                j > 0 ? score[1][i][j-1] + (B[j-1] == B[j]) : INT_MAX
            );
        }
    }

    printf("repetitions: %d\n", std::min(score[0][m-1][n], score[1][m][n-1]));

    print_result(m, n);
    printf("\n");

    return 0;
}