byte[] array pattern search

2019-01-01 03:44发布

Anyone know a good and effective way to search/match for a byte pattern in an byte[] array and then return the positions.

For example

byte[] pattern = new byte[] {12,3,5,76,8,0,6,125};

byte[] toBeSearched = new byte[] {23,36,43,76,125,56,34,234,12,3,5,76,8,0,6,125,234,56,211,122,22,4,7,89,76,64,12,3,5,76,8,0,6,125}

26条回答
谁念西风独自凉
2楼-- · 2019-01-01 04:20

I would use a solution which does matching by converting to a string...

You should write a simple function implementing the Knuth-Morris-Pratt searching algorithm. This will be the fastest simple algorithm you can use to find the correct indexes.(You could use Boyer-Moore but it will require more setup.

After you have optimized the algorithm, you could try to look for other kind of optimizations. But you should start with the basics.

For example, the current "fastest" is the Locate solution by Jb Evian.

if you look at the core

    for (int i = 0; i < self.Length; i++) {
            if (!IsMatch (self, i, candidate))
                    continue;

            list.Add (i);
    }

After a match of the sub algorithm, it will start to find a match at i + 1, but you already know that the first possible match would be i + candidate.Length. So if you add,

i += candidate.Length -2; //  -2 instead of -1 because the i++ will add the last index

it will be a lot faster when you expect a lot of occurrences of the subset in the superset. (Bruno Conde already does this in his solution)

But this is just a half of the KNP algorithm, you should also add an extra parameter to the IsMatch method called numberOfValidMatches which would be an out parameter.

this would resolve to the following:

int validMatches = 0;
if (!IsMatch (self, i, candidate, out validMatches))
{
    i += validMatches - 1; // -1 because the i++ will do the last one
    continue;
}

and

static bool IsMatch (byte [] array, int position, byte [] candidate, out int numberOfValidMatches)
{
    numberOfValidMatches = 0;
    if (candidate.Length > (array.Length - position))
            return false;

    for (i = 0; i < candidate.Length; i++)
    {
            if (array [position + i] != candidate [i])
                    return false;
            numberOfValidMatches++; 
    }

    return true;
}

A little bit of refactoring and you could use the numberOfValidMatches as the loop variable, and rewrite the Locate loop using a while to avoid the -2 and -1. But I just wanted to make clear how you could add the KMP algorithm.

查看更多
笑指拈花
3楼-- · 2019-01-01 04:23

I was missing a LINQ method/answer :-)

/// <summary>
/// Searches in the haystack array for the given needle using the default equality operator and returns the index at which the needle starts.
/// </summary>
/// <typeparam name="T">Type of the arrays.</typeparam>
/// <param name="haystack">Sequence to operate on.</param>
/// <param name="needle">Sequence to search for.</param>
/// <returns>Index of the needle within the haystack or -1 if the needle isn't contained.</returns>
public static IEnumerable<int> IndexOf<T>(this T[] haystack, T[] needle)
{
    if ((needle != null) && (haystack.Length >= needle.Length))
    {
        for (int l = 0; l < haystack.Length - needle.Length + 1; l++)
        {
            if (!needle.Where((data, index) => !haystack[l + index].Equals(data)).Any())
            {
                yield return l;
            }
        }
    }
}
查看更多
登录 后发表回答