I am looking for the best way to go from String
to Windows<T>
using the windows
function provided for slices.
I understand how to use windows this way:
fn main() {
let tst = ['a', 'b', 'c', 'd', 'e', 'f', 'g'];
let mut windows = tst.windows(3);
// prints ['a', 'b', 'c']
println!("{:?}", windows.next().unwrap());
// prints ['b', 'c', 'd']
println!("{:?}", windows.next().unwrap());
// etc...
}
But I am a bit lost when working this problem:
fn main() {
let tst = String::from("abcdefg");
let inter = ? //somehow create slice of character from tst
let mut windows = inter.windows(3);
// prints ['a', 'b', 'c']
println!("{:?}", windows.next().unwrap());
// prints ['b', 'c', 'd']
println!("{:?}", windows.next().unwrap());
// etc...
}
Essentially, I am looking for how to convert a string into a char slice that I can use the window method with.
You can use itertools to walk over windows of any iterator, up to a width of 4:
See also:
The problem that you are facing is that
String
is really represented as something like aVec<u8>
under the hood, with some APIs to let you accesschar
s. In UTF-8 the representation of a code point can be anything from 1 to 4 bytes, and they are all compacted together for space-efficiency.The only slice you could get directly of an entire
String
, without copying everything, would be a&[u8]
, but you wouldn't know if the bytes corresponded to whole or just parts of code points.The
char
type corresponds exactly to a code point, and therefore has a size of 4 bytes, so that it can accommodate any possible value. So, if you build a slice ofchar
by copying from aString
, the result could be up to 4 times larger.To avoid making a potentially large, temporary memory allocation, you should consider a more lazy approach – iterate through the
String
, making slices at exactly thechar
boundaries. Something like this:This will give you an iterator where the items are
&str
, each with 3char
s:The nice thing about this approach is that it hasn't done any copying at all - each
&str
produced by the iterator is still a slice into the original sourceString
.All of that complexity is because Rust uses UTF-8 encoding for strings by default. If you absolutely know that your input string doesn't contain any multi-byte characters, you can treat it as ASCII bytes, and taking slices becomes easy:
However, you now have slices of bytes, and you'll need to turn them back into strings to do anything with them:
This solution will work for your purpose. (playground)
String can iterate over its chars, but it's not a slice, so you have to collect it into a vec, which then coerces into a slice.