How to detect if a repeating pattern exists

2019-06-21 06:10发布

问题:

My question isn't language specific... I would probably implement this in C# or Python unless there is a specific feature of a language that helps me get what I am looking for.

Is there some sort of algorithm that anyone knows of that can help me determine if a list of numbers contains a repeating pattern?

Let's say I have a several lists of numbers...

[12, 4, 5, 7, 1, 2]
[1, 2, 3, 1, 2, 3, 1, 2, 3]
[1, 1, 1, 1, 1, 1]
[ 1, 2, 4, 12, 13, 1, 2, 4, 12, 13]

I need to detect if there is a repeating pattern in each list... For example, list 1 returns false, but and lists 2, 3, and 4 return true.

I was thinking maybe taking a count of each value that appears in the list and if val 1 == val 2 == val n... then that would do it. Any better ideas?

回答1:

You want to look at the autocorrelation of the signal. Autocorrelation basically does a convolution of the signal with itself. When a you iteratively slide one signal across another, and there is a repeating pattern, the output will resonate strongly.



回答2:

The second and fourth strings are periodic; I'm going to assume you're looking for an algorithm for detecting periodic strings. Most fast string matching algorithms need to find periods of strings in order to compute their shifting rules.

Knuth-Morris-Pratt's preprocessing, for instance, computes, for every prefix P[0..k] of the pattern P, the length SP[k] of the longest proper suffix P[s..k] of P[0..k] that exactly matches the prefix P[0..(k-s)]. If SP[k] < k/2, then P[0..k] is aperiodic; otherwise, it is a prefix of a string with period k - SP[k].



回答3:

One option would be to look at compression algorithms, some of those rely on finding repeating patterns and replacing them with another symbol. In your case you simply need the part that identifies the pattern. You may find that it is similar to the method that you've described already though



回答4:

Assuming that your "repeating pattern" is always repeated in full, like your sample data suggests, you could just think of your array as a bunch of repeating arrays of equal length. Meaning:

[1, 2, 3, 1, 2, 3, 1, 2, 3] is the same as [1, 2, 3] repeated three times.

This means that you could just check to see if every x value in the array is equal to each other. So: array[0] == array[3] == array[6]
array[1] == array[4] == array[7]
array[2] == array[5] == array[8]

Since you don't know the length of the repeated pattern, you'd just have to try all possible lengths until you found a pattern or ran out of possible shorter arrays. I'm sure there are optimizations that can be added to the following, but it works (assuming I understand the question correctly, of course).

static void Main(string[] args)
{
    int[] array1 = {12, 4, 5, 7, 1, 2};
    int[] array2 = {1, 2, 3, 1, 2, 3, 1, 2, 3};
    int[] array3 = {1, 1, 1, 1, 1, 1 };
    int[] array4 = {1, 2, 4, 12, 13, 1, 2, 4, 12, 13 };

    Console.WriteLine(splitMethod(array1));
    Console.WriteLine(splitMethod(array2));
    Console.WriteLine(splitMethod(array3));
    Console.WriteLine(splitMethod(array4));

    Console.ReadLine();
}

static bool splitMethod(int[] array)
{
    for(int patternLength = 1; patternLength <= array.Length/2; patternLength++)
    {
        // if the pattern length doesn't divide the length of the array evenly,
        // then we can't have a pattern of that length.
        if(array.Length % patternLength != 0)
        {
            continue;
        }

        // To check if every x value is equal, we need to give a start index
        // To begin our comparisons at.
        // We'll start at index 0 and check it against 0+x, 0+x+x, 0+x+x+x, etc.
        // Then we'll use index 1 and check it against 1+x, 1+x+x, 1+x+x+x, etc.
        // Then... etc.
        // If we find that every x value starting at a given start index aren't
        // equal, then we'll continue to the next pattern length.
        // We'll assume our patternLength will produce a pattern and let
        // our test determines if we don't have a pattern.
        bool foundPattern = true;
        for (int startIndex = 0; startIndex < patternLength; startIndex++)
        {
            if (!everyXValueEqual(array, patternLength, startIndex))
            {
                foundPattern = false;
                break;
            }
        }

        if (foundPattern)
        {
            return true;
        }
    }
    return false;
}

static bool everyXValueEqual(int[] array, int x, int startIndex)
{
    // if the next index we want to compare against is outside the bounds of the array
    // we've done all the matching we can for a pattern of length x.
    if (startIndex+x > array.Length-1)
        return true;

    // if the value at starIndex equals the value at startIndex + x
    // we can go on to test values at startIndex + x and startIndex + x + x
    if (array[startIndex] == array[startIndex + x])
        return everyXValueEqual(array, x, startIndex + x);

    return false;
}


回答5:

Since you're looking for repeated patterns, you could force your array into a string and run a regular expression against it. This being my second answer, I'm just playing around here.

    static Regex regex = new Regex(@"^(?<main>(?<v>;\d+)+?)(\k<main>)+$", RegexOptions.Compiled);
    static bool regexMethod(int[] array)
    {
        string a = ";" + string.Join(";", array);
        return regex.IsMatch(a);
    }

The regular expression is

  1. (?<v>;\d+) - A group named "v" which matches a semicolon (the delimiter in this case) and 1 or more digits

  2. (?<main>(?<v>;\d+)+?) - a group named "main" which matches the "v" group 1 or more times, but the least number of times it can to satisfy the regex.

  3. (\k<main>)+ - matches the text that the "main" group matched 1 or more times

  4. ^ ... $ - these anchor the ends of the pattern to the ends of the string.



回答6:

Simple pattern recognition is the task of compression algorithms. Depending on the type of input and the type of patterns you're looking for the algorithm of choice may be very different - just consider that any file is an array of bytes and there are many types of compression for various types of data. Lossless compression finds exact patterns that repeat and lossy compression - approximate patterns where the approximation is limited by some "real-world" consideration.

In your case you can apply a pseudo zip compression where you start filling up a list of encountered sequences

here's a pseudo suggestion:

//C#-based pseudo code

int[] input = GetInputData();
var encounters = new Dictionary<ItemCount<int[],int>>();// the string and the number of times it's found

int from = 0;
for(int to=0; to<input.Length; i++){ 
  for (int j = from; j<=i; j++){ // for each substring between 'from' and 'i'
    if (encounters.ContainsKey(input.SubArray(j,i)){
      if (j==from) from++;                  // if the entire substring already exists - move the starting point
      encounters[input.SubArray(j,i)] += 1; // increase the count where the substring already exists
    } else {
     // consider: if (MeetsSomeMinimumRequirements(input.SubArray(j,i))
      encounters.Add(input.SubArray(j,i),1); //add a new pattern
    }
  }
}
Output(encounters.Where(itemValue => itemValue.Value>1); // show the patterns found more than once

I haven't debugged the sample above, so use it just as a starting point. The core idea is that you'd have an encounters list where various substrings are collected and counted, the most frequent will have highest Value in the end.

You can alter the algorithm above by storing some function of the substrings instead of the entire substring or add some minimum requirements such as minimum length etc. Too many options, complete discussion is not possible within a post.