I have a &[u8]
slice over a binary buffer. I need to parse it, but a lot of the methods that I would like to use (such as str::find
) don't seem to be available on slices.
I've seen that I can covert both by buffer slice and my pattern to str
by using from_utf8_unchecked()
but that seems a little dangerous (and also really hacky).
How can I find a subsequence in this slice? I actually need the index of the pattern, not just a slice view of the parts, so I don't think split
will work.
Here's a simple implementation based on the windows
iterator.
fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
haystack.windows(needle.len()).position(|window| window == needle)
}
fn main() {
assert_eq!(find_subsequence(b"qwertyuiop", b"tyu"), Some(4));
assert_eq!(find_subsequence(b"qwertyuiop", b"asd"), None);
}
The find_subsequence
function can also be made generic:
fn find_subsequence<T>(haystack: &[T], needle: &[T]) -> Option<usize>
where for<'a> &'a [T]: PartialEq
{
haystack.windows(needle.len()).position(|window| window == needle)
}
I don't think the standard library contains a function for this. Some libcs have memmem
, but at the moment the libc crate does not wrap this. You can use the twoway
crate however. rust-bio
implements some pattern matching algorithms, too. All of those should be faster than using haystack.windows(..).position(..)
How about Regex on bytes? That looks very powerful. See this rust playground demo.
use regex::bytes::Regex;
// This shows how to find all null-terminated strings in a slice of bytes
let re = Regex::new(r"(?-u)(?P<cstr>[^\x00]+)\x00").unwrap();
let text = b"foo\x00bar\x00baz\x00";
// Extract all of the strings without the null terminator from each match.
// The unwrap is OK here since a match requires the `cstr` capture to match.
let cstrs: Vec<&[u8]> =
re.captures_iter(text)
.map(|c| c.name("cstr").unwrap().as_bytes())
.collect();
assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);
I found the memmem
crate useful for this task:
use memmem::{Searcher, TwoWaySearcher};
let search = TwoWaySearcher::new("dog".as_bytes());
assert_eq!(
search.search_in("The quick brown fox jumped over the lazy dog.".as_bytes()),
Some(41)
);