Read an arbitrary number of bytes from type implem

2019-02-24 17:38发布

问题:

I have something that is Read; currently it's a File. I want to read a number of bytes from it that is only known at runtime (length prefix in a binary data structure).

So I tried this:

let mut vec = Vec::with_capacity(length);
let count = file.read(vec.as_mut_slice()).unwrap();

but count is zero because vec.as_mut_slice().len() is zero as well.

[0u8;length] of course doesn't work because the size must be known at compile time.

I wanted to do

let mut vec = Vec::with_capacity(length);
let count = file.take(length).read_to_end(vec).unwrap();

but take's receiver parameter is a T and I only have &mut T (and I'm not really sure why it's needed anyway).

I guess I can replace File with BufReader and dance around with fill_buf and consume which sounds complicated enough but I still wonder: Have I overlooked something?

回答1:

1. Fill-this-vector version

Your first solution is close to work. You identified the problem but did not try to solve it! The problem is that whatever the capacity of the vector, it is still empty (vec.len() == 0). Instead, you could actually fill it with empty elements, such as:

let mut vec = vec![0u8; length];

The following full code works:

#![feature(convert)] // needed for `as_mut_slice()` as of 2015-07-19

use std::fs::File;
use std::io::Read;

fn main() {
    let mut file = File::open("/usr/share/dict/words").unwrap();
    let length: usize = 100;
    let mut vec = vec![0u8; length];
    let count = file.read(vec.as_mut_slice()).unwrap();
    println!("read {} bytes.", count);
    println!("vec = {:?}", vec);
}

Of course, you still have to check whether count == length, and read more data into the buffer if that's not the case.


2. Iterator version

Your second solution is better because you won't have to check how many bytes have been read, and you won't have to re-read in case count != length. You need to use the bytes() function on the Read trait (implemented by File). This transform the file into a stream (i.e an iterator). Because errors can still happen, you don't get an Iterator<Item=u8> but an Iterator<Item=Result<u8, R::Err>>. Hence you need to deal with failures explicitly within the iterator. We're going to use unwrap() here for simplicity:

use std::fs::File;
use std::io::Read;

fn main() {
    let file = File::open("/usr/share/dict/words").unwrap();
    let length: usize = 100;
    let vec: Vec<u8> = file
        .bytes()
        .take(length)
        .map(|r: Result<u8, _>| r.unwrap()) // or deal explicitly with failure!
        .collect();
    println!("vec = {:?}", vec);
}


回答2:

Like the Iterator adaptors, the IO adaptors take self by value to be as efficient as possible. Also like the Iterator adaptors, a mutable reference to a Read is also a Read.

To solve your problem, you just need Read::by_ref:

use std::io::Read;
use std::fs::File;

fn main() {
    let mut file = File::open("/etc/hosts").unwrap();
    let length = 5;

    let mut vec = Vec::with_capacity(length);
    file.by_ref().take(length as u64).read_to_end(&mut vec).unwrap();

    let mut the_rest = Vec::new();
    file.read_to_end(&mut the_rest).unwrap();
}


回答3:

You can always use a bit of unsafe to create a vector of uninitialized memory. It is perfectly safe to do with primitive types:

let mut v: Vec<u8> = Vec::with_capacity(length);
unsafe { v.set_len(length); }
let count = file.read(vec.as_mut_slice()).unwrap();

This way, vec.len() will be set to its capacity, and all bytes in it will be uninitialized (likely zeros, but possibly some garbage). This way you can avoid zeroing the memory, which is pretty safe for primitive types.

Note that read() method on Read is not guaranteed to fill the whole slice. It is possible for it to return with number of bytes less than the slice length. There are several RFCs on adding methods to fill this gap, for example, this one.



标签: io rust