Is there an idiomatic way to process a file one character at a time in Rust?
This seems to be roughly what I'm after:
let mut f = io::BufReader::new(try!(fs::File::open("input.txt")));
for c in f.chars() {
println!("Character: {}", c.unwrap());
}
But Read::chars
is still unstable as of Rust v1.6.0.
I considered using Read::read_to_string
, but the file may be large and I don't want to read it all into memory.
Let's compare 4 approaches.
1. Read::chars
You could copy Read::chars
implementation, but it is marked unstable with
the semantics of a partial read/write of where errors happen is currently unclear and may change
so some care must be taken. Anyway, this seems to be the best approach.
2. flat_map
The flat_map
alternative does not compile:
use std::io::{BufRead, BufReader};
use std::fs::File;
pub fn main() {
let mut f = BufReader::new(File::open("input.txt").expect("open failed"));
for c in f.lines().flat_map(|l| l.expect("lines failed").chars()) {
println!("Character: {}", c);
}
}
The problems is that chars
borrows from the string, but l.expect("lines failed")
lives only inside the closure, so compiler gives the error borrowed value does not live long enough
.
3. Nested for
This code
use std::io::{BufRead, BufReader};
use std::fs::File;
pub fn main() {
let mut f = BufReader::new(File::open("input.txt").expect("open failed"));
for line in f.lines() {
for c in line.expect("lines failed").chars() {
println!("Character: {}", c);
}
}
}
works, but it keeps allocation a string for each line. Besides, if there is no line break on the input file, the whole file would be load to the memory.
4. BufRead::read_until
A memory efficient alternative to approach 3 is to use Read::read_until
, and use a single string to read each line:
use std::io::{BufRead, BufReader};
use std::fs::File;
pub fn main() {
let mut f = BufReader::new(File::open("input.txt").expect("open failed"));
let mut buf = Vec::<u8>::new();
while f.read_until(b'\n', &mut buf).expect("read_until failed") != 0 {
// this moves the ownership of the read data to s
// there is no allocation
let s = String::from_utf8(buf).expect("from_utf8 failed");
for c in s.chars() {
println!("Character: {}", c);
}
// this returns the ownership of the read data to buf
// there is no allocation
buf = s.into_bytes();
buf.clear();
}
}
There are two solutions that make sense here.
First, you could copy the implementation of Read::chars()
and use it; that would make it completely trivial to move your code over to the standard library implementation if/when it stabilizes.
On the other hand, you could simply iterate line by line (using f.lines()
) and then use line.chars()
on each line to get the chars. This is a little more hacky, but it will definitely work.
If you only wanted one loop, you could use flat_map()
with a lambda like |line| line.chars()
.