byte[] array pattern search

2019-01-01 03:44发布

Anyone know a good and effective way to search/match for a byte pattern in an byte[] array and then return the positions.

For example

byte[] pattern = new byte[] {12,3,5,76,8,0,6,125};

byte[] toBeSearched = new byte[] {23,36,43,76,125,56,34,234,12,3,5,76,8,0,6,125,234,56,211,122,22,4,7,89,76,64,12,3,5,76,8,0,6,125}

26条回答
荒废的爱情
2楼-- · 2019-01-01 04:07

Use the efficient Boyer-Moore algorithm.

It's designed to find strings withing strings but you need little imagination to project this to byte arrays.

In general the best answer is: use any string searching algorithm that you like :).

查看更多
公子世无双
3楼-- · 2019-01-01 04:08

These are the simplest and fastest methods you can use, and there wouldn't be anything faster than these. It is unsafe but thats what we use pointers for is speed. So here I offer you my extension methods that I use search for a single, and a list of indexes of the occurances. I would like to say this is the cleanest code here.

    public static unsafe long IndexOf(this byte[] Haystack, byte[] Needle)
    {
        fixed (byte* H = Haystack) fixed (byte* N = Needle)
        {
            long i = 0;
            for (byte* hNext = H, hEnd = H + Haystack.LongLength; hNext < hEnd; i++, hNext++)
            {
                bool Found = true;
                for (byte* hInc = hNext, nInc = N, nEnd = N + Needle.LongLength; Found && nInc < nEnd; Found = *nInc == *hInc, nInc++, hInc++) ;
                if (Found) return i;
            }
            return -1;
        }
    }
    public static unsafe List<long> IndexesOf(this byte[] Haystack, byte[] Needle)
    {
        List<long> Indexes = new List<long>();
        fixed (byte* H = Haystack) fixed (byte* N = Needle)
        {
            long i = 0;
            for (byte* hNext = H, hEnd = H + Haystack.LongLength; hNext < hEnd; i++, hNext++)
            {
                bool Found = true;
                for (byte* hInc = hNext, nInc = N, nEnd = N + Needle.LongLength; Found && nInc < nEnd; Found = *nInc == *hInc, nInc++, hInc++) ;
                if (Found) Indexes.Add(i);
            }
            return Indexes;
        }
    }

Benchmarked with Locate, it is 1.2-1.4x faster

查看更多
何处买醉
4楼-- · 2019-01-01 04:10

Just another answer that is easy to follow and pretty efficient for a O(n) type of operation without using unsafe code or copying parts of the source arrays.

Be sure to test. Some of the suggestions found on this topic are susceptible to gotta situations.

    static void Main(string[] args)
    {
        //                                                         1   1  1  1  1  1  1  1  1  1  2   2   2
        //                           0  1  2  3  4  5  6  7  8  9  0   1  2  3  4  5  6  7  8  9  0   1   2  3  4  5  6  7  8  9
        byte[] buffer = new byte[] { 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 5, 5, 0, 5, 5, 1, 2 };
        byte[] beginPattern = new byte[] { 1, 0, 2 };
        byte[] middlePattern = new byte[] { 8, 9, 10 };
        byte[] endPattern = new byte[] { 9, 10, 11 };
        byte[] wholePattern = new byte[] { 1, 0, 2, 3, 4, 5, 6, 7, 8, 9, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
        byte[] noMatchPattern = new byte[] { 7, 7, 7 };

        int beginIndex = ByteArrayPatternIndex(buffer, beginPattern);
        int middleIndex = ByteArrayPatternIndex(buffer, middlePattern);
        int endIndex = ByteArrayPatternIndex(buffer, endPattern);
        int wholeIndex = ByteArrayPatternIndex(buffer, wholePattern);
        int noMatchIndex = ByteArrayPatternIndex(buffer, noMatchPattern);
    }

    /// <summary>
    /// Returns the index of the first occurrence of a byte array within another byte array
    /// </summary>
    /// <param name="buffer">The byte array to be searched</param>
    /// <param name="pattern">The byte array that contains the pattern to be found</param>
    /// <returns>If buffer contains pattern then the index of the first occurrence of pattern within buffer; otherwise, -1</returns>
    public static int ByteArrayPatternIndex(byte[] buffer, byte[] pattern)
    {
        if (buffer != null && pattern != null && pattern.Length <= buffer.Length)
        {
            int resumeIndex;
            for (int i = 0; i <= buffer.Length - pattern.Length; i++)
            {
                if (buffer[i] == pattern[0]) // Current byte equals first byte of pattern
                {
                    resumeIndex = 0;
                    for (int x = 1; x < pattern.Length; x++)
                    {
                        if (buffer[i + x] == pattern[x])
                        {
                            if (x == pattern.Length - 1)  // Matched the entire pattern
                                return i;
                            else if (resumeIndex == 0 && buffer[i + x] == pattern[0])  // The current byte equals the first byte of the pattern so start here on the next outer loop iteration
                                resumeIndex = i + x;
                        }
                        else
                        {
                            if (resumeIndex > 0)
                                i = resumeIndex - 1;  // The outer loop iterator will increment so subtract one
                            else if (x > 1)
                                i += (x - 1);  // Advance the outer loop variable since we already checked these bytes
                            break;
                        }
                    }
                }
            }
        }
        return -1;
    }

    /// <summary>
    /// Returns the indexes of each occurrence of a byte array within another byte array
    /// </summary>
    /// <param name="buffer">The byte array to be searched</param>
    /// <param name="pattern">The byte array that contains the pattern to be found</param>
    /// <returns>If buffer contains pattern then the indexes of the occurrences of pattern within buffer; otherwise, null</returns>
    /// <remarks>A single byte in the buffer array can only be part of one match.  For example, if searching for 1,2,1 in 1,2,1,2,1 only zero would be returned.</remarks>
    public static int[] ByteArrayPatternIndex(byte[] buffer, byte[] pattern)
    {
        if (buffer != null && pattern != null && pattern.Length <= buffer.Length)
        {
            List<int> indexes = new List<int>();
            int resumeIndex;
            for (int i = 0; i <= buffer.Length - pattern.Length; i++)
            {
                if (buffer[i] == pattern[0]) // Current byte equals first byte of pattern
                {
                    resumeIndex = 0;
                    for (int x = 1; x < pattern.Length; x++)
                    {
                        if (buffer[i + x] == pattern[x])
                        {
                            if (x == pattern.Length - 1)  // Matched the entire pattern
                                indexes.Add(i);
                            else if (resumeIndex == 0 && buffer[i + x] == pattern[0])  // The current byte equals the first byte of the pattern so start here on the next outer loop iteration
                                resumeIndex = i + x;
                        }
                        else
                        {
                            if (resumeIndex > 0)
                                i = resumeIndex - 1;  // The outer loop iterator will increment so subtract one
                            else if (x > 1)
                                i += (x - 1);  // Advance the outer loop variable since we already checked these bytes
                            break;
                        }
                    }
                }
            }
            if (indexes.Count > 0)
                return indexes.ToArray();
        }
        return null;
    }
查看更多
琉璃瓶的回忆
5楼-- · 2019-01-01 04:11

May I suggest something that doesn't involve creating strings, copying arrays or unsafe code:

using System;
using System.Collections.Generic;

static class ByteArrayRocks {

    static readonly int [] Empty = new int [0];

    public static int [] Locate (this byte [] self, byte [] candidate)
    {
        if (IsEmptyLocate (self, candidate))
            return Empty;

        var list = new List<int> ();

        for (int i = 0; i < self.Length; i++) {
            if (!IsMatch (self, i, candidate))
                continue;

            list.Add (i);
        }

        return list.Count == 0 ? Empty : list.ToArray ();
    }

    static bool IsMatch (byte [] array, int position, byte [] candidate)
    {
        if (candidate.Length > (array.Length - position))
            return false;

        for (int i = 0; i < candidate.Length; i++)
            if (array [position + i] != candidate [i])
                return false;

        return true;
    }

    static bool IsEmptyLocate (byte [] array, byte [] candidate)
    {
        return array == null
            || candidate == null
            || array.Length == 0
            || candidate.Length == 0
            || candidate.Length > array.Length;
    }

    static void Main ()
    {
        var data = new byte [] { 23, 36, 43, 76, 125, 56, 34, 234, 12, 3, 5, 76, 8, 0, 6, 125, 234, 56, 211, 122, 22, 4, 7, 89, 76, 64, 12, 3, 5, 76, 8, 0, 6, 125 };
        var pattern = new byte [] { 12, 3, 5, 76, 8, 0, 6, 125 };

        foreach (var position in data.Locate (pattern))
            Console.WriteLine (position);
    }
}

Edit (by IAbstract) - moving contents of post here since it is not an answer

Out of curiosity, I've created a small benchmark with the different answers.

Here are the results for a million iterations:

solution [Locate]:            00:00:00.7714027
solution [FindAll]:           00:00:03.5404399
solution [SearchBytePattern]: 00:00:01.1105190
solution [MatchBytePattern]:  00:00:03.0658212

查看更多
看风景的人
6楼-- · 2019-01-01 04:11

thanks for taking the time...

This is the code that i was using/testing with before I asked my question... The reason I ask this question was that I´m certain of that I´m not using the optimal code to do this... so thanks again for taking the time!

   private static int CountPatternMatches(byte[] pattern, byte[] bytes)
   {
        int counter = 0;

        for (int i = 0; i < bytes.Length; i++)
        {
            if (bytes[i] == pattern[0] && (i + pattern.Length) < bytes.Length)
            {
                for (int x = 1; x < pattern.Length; x++)
                {
                    if (pattern[x] != bytes[x+i])
                    {
                        break;
                    }

                    if (x == pattern.Length -1)
                    {
                        counter++;
                        i = i + pattern.Length;
                    }
                }
            }
        }

        return counter;
    }

Anyone that see any errors in my code? Is this considered as an hackish approach? I have tried almost every sample you guys have posted and I seems to get some variations in the match results. I have been running my tests with ~10Mb byte array as my toBeSearched array.

查看更多
美炸的是我
7楼-- · 2019-01-01 04:13

You can use ORegex:

var oregex = new ORegex<byte>("{0}{1}{2}", x=> x==12, x=> x==3, x=> x==5);
var toSearch = new byte[]{1,1,12,3,5,1,12,3,5,5,5,5};

var found = oregex.Matches(toSearch);

Will be found two matches:

i:2;l:3
i:6;l:3

Complexity: O(n*m) in worst case, in real life it is O(n) because of internal state machine. It is faster than .NET Regex in some cases. It is compact, fast and designed especialy for array pattern matching.

查看更多
登录 后发表回答