Now that the Read::chars
iterator has been officially deprecated, what is the the proper way to obtain an iterator over the chars coming from a Reader
like stdin without reading the entire stream into memory?
问题:
回答1:
The corresponding issue for deprecation nicely sums up the problems with Read::chars
and offers suggestions:
Code that does not care about processing data incrementally can use
Read::read_to_string
instead. Code that does care presumably also wants to control its buffering strategy and work with&[u8]
and&str
slices that are as large as possible, rather than onechar
at a time. It should be based on thestr::from_utf8
function as well as thevalid_up_to
anderror_len
methods of theUtf8Error
type. One tricky aspect is dealing with cases where a singlechar
is represented in UTF-8 by multiple bytes where those bytes happen to be split across separateread
calls / buffer chunks. (Utf8Error::error_len
returningNone
indicates that this may be the case.) Theutf-8
crate solves this, but in order to be flexible provides an API that probably has too much surface to be included in the standard library.Of course the above is for data that is always UTF-8. If other character encoding need to be supported, consider using the
encoding_rs
orencoding
crate.
Your own iterator
The most efficient solution in terms of number of I/O calls is to read everything into a giant buffer String
and iterate over that:
use std::io::{self, Read};
fn main() {
let stdin = io::stdin();
let mut s = String::new();
stdin.lock().read_to_string(&mut s).expect("Couldn't read");
for c in s.chars() {
println!(">{}<", c);
}
}
You can combine this with an answer from Is there an owned version of String::chars?:
use std::io::{self, Read};
fn reader_chars<R: Read>(mut rdr: R) -> io::Result<impl Iterator<Item = char>> {
let mut s = String::new();
rdr.read_to_string(&mut s)?;
Ok(s.into_chars()) // from https://stackoverflow.com/q/47193584/155423
}
fn main() -> io::Result<()> {
let stdin = io::stdin();
for c in reader_chars(stdin.lock())? {
println!(">{}<", c);
}
Ok(())
}
We now have a function that returns an iterator of char
s for any type that implements Read
.
Once you have this pattern, it's just a matter of deciding where to make the tradeoff of memory allocation vs I/O requests. Here's a similar idea that uses line-sized buffers:
use std::io::{BufRead, BufReader, Read};
fn reader_chars<R: Read>(rdr: R) -> impl Iterator<Item = char> {
// We use 6 bytes here to force emoji to be segmented for demo purposes
// Pick more appropriate size for your case
let reader = BufReader::with_capacity(6, rdr);
reader
.lines()
.flat_map(|l| l) // Ignoring any errors
.flat_map(|s| s.into_chars()) // from https://stackoverflow.com/q/47193584/155423
}
fn main() {
// emoji are 4 bytes each
let data = "