The following code reads space-delimited records from stdin, and writes comma-delimited records to stdout. Even with optimized builds it's rather slow (about twice as slow as using, say, awk).
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let fields: Vec<_> = line.split(' ').collect();
println!("{}", fields.join(","));
}
}
One obvious improvement would be to use itertools
to join without allocating a vector (the collect
call causes an allocation). However, I tried a different approach:
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache.clear();
}
}
This version tries to reuse the same vector over and over. Unfortunately, the compiler complains:
error: `line` does not live long enough
--> src/main.rs:7:22
|
7 | cache.extend(line.split(' '));
| ^^^^
|
note: reference must be valid for the block suffix following statement 1 at 5:39...
--> src/main.rs:5:40
|
5 | let mut cache = Vec::<&str>::new();
| ^
note: ...but borrowed value is only valid for the for at 6:4
--> src/main.rs:6:5
|
6 | for line in stdin.lock().lines().map(|x| x.unwrap()) {
| ^
error: aborting due to previous error
Which of course makes sense: the line
variable is only alive in the body of the for
loop, whereas cache
keeps a pointer into it across iterations. But that error still looks spurious to me: since the cache is clear
ed after each iteration, no reference to line
can be kept, right?
How can I tell the borrow checker about this?
The only way to do this is to use
transmute
to change theVec<&'a str>
into aVec<&'b str>
.transmute
is unsafe and Rust will not raise an error if you forget the call toclear
here. You might want to extend theunsafe
block up to after the call toclear
to make it clear (no pun intended) where the code returns to "safe land".In this case Rust doesn't know what you're trying to do. Unfortunately,
.clear()
does not affect how.extend()
is checked.The
cache
is a "vector of strings that live as long as the main function", but inextend()
calls you're appending "strings that live only as long as one loop iteration", so that's a type mismatch. The call to.clear()
doesn't change the types.Usually such limited-time uses are expressed by making a long-lived opaque object that enables access to its memory by borrowing a temporary object with the right lifetime, like
RefCell.borrow()
gives a temporaryRef
object. Implementation of that would be a bit involved and would require unsafe methods for recyclingVec
's internal memory.In this case an alternative solution could be to avoid any allocations at all (
.join()
allocates too) and stream the printing thanks toPeekable
iterator wrapper:BTW: Francis' answer with
transmute
is good too. You can useunsafe
to say you know what you're doing and override the lifetime check.Itertools has
.format()
for the purpose of lazy formatting, which skips allocating a string too.(A digression, something like this is a “safe abstraction” in the littlest sense of the solution in another answer here:
)