Anyone know a good and effective way to search/match for a byte pattern in an byte[] array and then return the positions.
For example
byte[] pattern = new byte[] {12,3,5,76,8,0,6,125};
byte[] toBeSearched = new byte[] {23,36,43,76,125,56,34,234,12,3,5,76,8,0,6,125,234,56,211,122,22,4,7,89,76,64,12,3,5,76,8,0,6,125}
You can put the byte array into String and run match by IndexOf. Or you can at least reuse existing algorithms on string matching.
My version of Foubar's answer above, which avoids searching past the end of the haystack, and allows specifying a starting offset. Assumes needle is not empty or longer than the haystack.
I've created a new function using the tips from my answer and the answer by Alnitak.
the only part I'm not so happy about is the
part... I would have like to use a if else to avoid the -1, but this results in a better branch prediction (although I'm not that sure it will matter that much)..
Here's my (not the most performant) solution. It relies on the fact that bytes/latin-1 conversion is lossless, which is not true for bytes/ASCII or bytes/UTF8 conversions.
It's advantages are that It Works (tm) for any byte values (some other solutions work incorrectly with bytes 0x80-0xff) and can be extended to perform more advanced regex matching.
I'm a little late to the party How about using Boyer Moore algorithm but search for bytes instead of strings. c# code below.
EyeCode Inc.
Speed isn't everything. Did you check them for consistency?
I didn't test all the code listed here. I tested my own code (which wasn't totally consistent, I admit) and IndexOfSequence. I found that for many tests IndexOfSequence was quite a bit faster than my code but with repeated testing I found that it was less consistent. In particular it seems to have the most trouble with finding patterns at the end of the array but it will miss them in the middle of the array too sometimes.
My test code isn't designed for efficiency, I just wanted to have a bunch of random data with some known strings inside. That test pattern is roughly like a boundary marker in an http form upload stream. That's what I was looking for when I ran across this code so I figured I'd test it with the kind of data I'll be searching for. It appears that the longer the pattern is the more often IndexOfSequence will miss a value.
(obviously I converted IndexOfSequence from an extension back into a normal method for this testing)
Here's a sample run of my output:
I don't mean to pick on IndexOfSequence, it just happened to be the one I started working with today. I noticed at the end of the day that it seemed to be missing patterns in the data so I wrote my own pattern matcher tonight. It's not as fast though. I'm going to tweak it a bit more to see if I can get it 100% consistent before I post it.
I just wanted to remind everyone that they should test things like this to make sure they give good, repeatable results before you trust them in production code.