byte[] array pattern search

2019-01-01 03:44发布

Anyone know a good and effective way to search/match for a byte pattern in an byte[] array and then return the positions.

For example

byte[] pattern = new byte[] {12,3,5,76,8,0,6,125};

byte[] toBeSearched = new byte[] {23,36,43,76,125,56,34,234,12,3,5,76,8,0,6,125,234,56,211,122,22,4,7,89,76,64,12,3,5,76,8,0,6,125}

26条回答
其实,你不懂
2楼-- · 2019-01-01 04:04

Originally I posted some old code I used but was curious about Jb Evain's benchmarks. I found that my solution was stupid slow. It appears that bruno conde's SearchBytePattern is the fastest. I could not figure why especially since he uses an Array.Copy and an Extension method. But there is proof in Jb's tests, so kudos to bruno.

I simplified the bits even further, so hopefully this will be the clearest and simplest solution. (All the hard work done by bruno conde) The enhancements are:

  • Buffer.BlockCopy
  • Array.IndexOf<byte>
  • while loop instead of a for loop
  • start index parameter
  • converted to extension method

    public static List<int> IndexOfSequence(this byte[] buffer, byte[] pattern, int startIndex)    
    {
       List<int> positions = new List<int>();
       int i = Array.IndexOf<byte>(buffer, pattern[0], startIndex);  
       while (i >= 0 && i <= buffer.Length - pattern.Length)  
       {
          byte[] segment = new byte[pattern.Length];
          Buffer.BlockCopy(buffer, i, segment, 0, pattern.Length);    
          if (segment.SequenceEqual<byte>(pattern))
               positions.Add(i);
          i = Array.IndexOf<byte>(buffer, pattern[0], i + pattern.Length);
       }
       return positions;    
    }
    
查看更多
梦寄多情
3楼-- · 2019-01-01 04:04

Here is a solution I came up with. I included notes I found along the way with the implementation. It can match forward, backward and with different (in/dec)remement amounts e.g. direction; starting at any offset in the haystack.

Any input would be awesome!

    /// <summary>
    /// Matches a byte array to another byte array
    /// forwards or reverse
    /// </summary>
    /// <param name="a">byte array</param>
    /// <param name="offset">start offset</param>
    /// <param name="len">max length</param>
    /// <param name="b">byte array</param>
    /// <param name="direction">to move each iteration</param>
    /// <returns>true if all bytes match, otherwise false</returns>
    internal static bool Matches(ref byte[] a, int offset, int len, ref byte[] b, int direction = 1)
    {
        #region Only Matched from offset Within a and b, could not differ, e.g. if you wanted to mach in reverse for only part of a in some of b that would not work
        //if (direction == 0) throw new ArgumentException("direction");
        //for (; offset < len; offset += direction) if (a[offset] != b[offset]) return false;
        //return true;
        #endregion
        //Will match if b contains len of a and return a a index of positive value
        return IndexOfBytes(ref a, ref offset, len, ref b, len) != -1;
    }

///Here is the Implementation code

    /// <summary>
    /// Swaps two integers without using a temporary variable
    /// </summary>
    /// <param name="a"></param>
    /// <param name="b"></param>
    internal static void Swap(ref int a, ref int b)
    {
        a ^= b;
        b ^= a;
        a ^= b;
    }

    /// <summary>
    /// Swaps two bytes without using a temporary variable
    /// </summary>
    /// <param name="a"></param>
    /// <param name="b"></param>
    internal static void Swap(ref byte a, ref byte b)
    {
        a ^= b;
        b ^= a;
        a ^= b;
    }

    /// <summary>
    /// Can be used to find if a array starts, ends spot Matches or compltely contains a sub byte array
    /// Set checkLength to the amount of bytes from the needle you want to match, start at 0 for forward searches start at hayStack.Lenght -1 for reverse matches
    /// </summary>
    /// <param name="a">Needle</param>
    /// <param name="offset">Start in Haystack</param>
    /// <param name="len">Length of required match</param>
    /// <param name="b">Haystack</param>
    /// <param name="direction">Which way to move the iterator</param>
    /// <returns>Index if found, otherwise -1</returns>
    internal static int IndexOfBytes(ref byte[] needle, ref int offset, int checkLength, ref byte[] haystack, int direction = 1)
    {
        //If the direction is == 0 we would spin forever making no progress
        if (direction == 0) throw new ArgumentException("direction");
        //Cache the length of the needle and the haystack, setup the endIndex for a reverse search
        int needleLength = needle.Length, haystackLength = haystack.Length, endIndex = 0, workingOffset = offset;
        //Allocate a value for the endIndex and workingOffset
        //If we are going forward then the bound is the haystackLength
        if (direction >= 1) endIndex = haystackLength;
        #region [Optomization - Not Required]
        //{

            //I though this was required for partial matching but it seems it is not needed in this form
            //workingOffset = needleLength - checkLength;
        //}
        #endregion
        else Swap(ref workingOffset, ref endIndex);                
        #region [Optomization - Not Required]
        //{ 
            //Otherwise we are going in reverse and the endIndex is the needleLength - checkLength                   
            //I though the length had to be adjusted but it seems it is not needed in this form
            //endIndex = needleLength - checkLength;
        //}
        #endregion
        #region [Optomized to above]
        //Allocate a value for the endIndex
        //endIndex = direction >= 1 ? haystackLength : needleLength - checkLength,
        //Determine the workingOffset
        //workingOffset = offset > needleLength ? offset : needleLength;            
        //If we are doing in reverse swap the two
        //if (workingOffset > endIndex) Swap(ref workingOffset, ref endIndex);
        //Else we are going in forward direction do the offset is adjusted by the length of the check
        //else workingOffset -= checkLength;
        //Start at the checkIndex (workingOffset) every search attempt
        #endregion
        //Save the checkIndex (used after the for loop is done with it to determine if the match was checkLength long)
        int checkIndex = workingOffset;
        #region [For Loop Version]
        ///Optomized with while (single op)
        ///for (int checkIndex = workingOffset; checkIndex < endIndex; offset += direction, checkIndex = workingOffset)
            ///{
                ///Start at the checkIndex
                /// While the checkIndex < checkLength move forward
                /// If NOT (the needle at the checkIndex matched the haystack at the offset + checkIndex) BREAK ELSE we have a match continue the search                
                /// for (; checkIndex < checkLength; ++checkIndex) if (needle[checkIndex] != haystack[offset + checkIndex]) break; else continue;
                /// If the match was the length of the check
                /// if (checkIndex == checkLength) return offset; //We are done matching
            ///}
        #endregion
        //While the checkIndex < endIndex
        while (checkIndex < endIndex)
        {
            for (; checkIndex < checkLength; ++checkIndex) if (needle[checkIndex] != haystack[offset + checkIndex]) break; else continue;
            //If the match was the length of the check
            if (checkIndex == checkLength) return offset; //We are done matching
            //Move the offset by the direction, reset the checkIndex to the workingOffset
            offset += direction; checkIndex = workingOffset;                
        }
        //We did not have a match with the given options
        return -1;
    }
查看更多
不流泪的眼
4楼-- · 2019-01-01 04:05

This is my own approach on the topic. I used pointers to ensure that it is faster on larger arrays. This function will return the first occurence of the sequence (which is what I needed in my own case).

I am sure you can modify it a little bit in order to return a list with all occurences.

What I do is fairly simple. I loop through the source array (haystack) until I find the first byte of the pattern (needle). When the first byte is found, I continue checking separately if the next bytes match the next bytes of the pattern. If not, I continue searching as normal, from the index (in the haystack) I was previously on, before trying to match the needle.

So here's the code:

    public unsafe int IndexOfPattern(byte[] src, byte[] pattern)
    {
        fixed(byte *srcPtr = &src[0])
        fixed (byte* patternPtr = &pattern[0])
        {
            for (int x = 0; x < src.Length; x++)
            {
                byte currentValue = *(srcPtr + x);

                if (currentValue != *patternPtr) continue;

                bool match = false;

                for (int y = 0; y < pattern.Length; y++)
                {
                    byte tempValue = *(srcPtr + x + y);
                    if (tempValue != *(patternPtr + y))
                    {
                        match = false;
                        break;
                    }

                    match = true;
                }

                if (match)
                    return x;
            }
        }
        return -1;
    }

Safe code below:

    public int IndexOfPatternSafe(byte[] src, byte[] pattern)
    {
        for (int x = 0; x < src.Length; x++)
        {
            byte currentValue = src[x];
            if (currentValue != pattern[0]) continue;

            bool match = false;

            for (int y = 0; y < pattern.Length; y++)
            {
                byte tempValue = src[x + y];
                if (tempValue != pattern[y])
                {
                    match = false;
                    break;
                }

                match = true;
            }

            if (match)
                return x;
        }

        return -1;
    }
查看更多
君临天下
5楼-- · 2019-01-01 04:05

toBeSearched.Except(pattern) will return you differences toBeSearched.Intersect(pattern) will produce set of intersections Generally, you should look into extended methods within Linq extensions

查看更多
其实,你不懂
6楼-- · 2019-01-01 04:06

Use LINQ Methods.

public static IEnumerable<int> PatternAt(byte[] source, byte[] pattern)
{
    for (int i = 0; i < source.Length; i++)
    {
        if (source.Skip(i).Take(pattern.Length).SequenceEqual(pattern))
        {
            yield return i;
        }
    }
}

Very simple!

查看更多
与君花间醉酒
7楼-- · 2019-01-01 04:06

I tried to understand Sanchez's proposal and make faster search.Below code's performance nearly equal.But code is more understandable.

public int Search3(byte[] src, byte[] pattern)
    {
        int index = -1;

        for (int i = 0; i < src.Length; i++)
        {
            if (src[i] != pattern[0])
            {
                continue;
            }
            else
            {
                bool isContinoue = true;
                for (int j = 1; j < pattern.Length; j++)
                {
                    if (src[++i] != pattern[j])
                    {
                        isContinoue = true;
                        break;
                    }
                    if(j == pattern.Length - 1)
                    {
                        isContinoue = false;
                    }
                }
                if ( ! isContinoue)
                {
                    index = i-( pattern.Length-1) ;
                    break;
                }
            }
        }
        return index;
    }
查看更多
登录 后发表回答