How do I return an Iterator that's generated b

2019-01-28 03:02发布

问题:

Update: The title of the post has been updated, and the answer has been moved out of the question. The short answer is you can't. Please see my answer to this question.

I'm following an Error Handling blog post here (github for it is here), and I tried to make some modifications to the code so that the search function returns an Iterator instead of a Vec. This has been insanely difficult, and I'm stuck.

I've gotten up to this point:

fn search<'a, P: AsRef<Path>>(file_path: &Option<P>, city: &str)
    -> Result<FilterMap<csv::reader::DecodedRecords<'a, Box<Read>, Row>,
                        FnMut(Result<Row, csv::Error>)
                            -> Option<Result<PopulationCount, csv::Error>>>,
              CliError>  {
    let mut found = vec![];
    let input: Box<io::Read> = match *file_path {
        None => Box::new(io::stdin()),
        Some(ref file_path) => Box::new(try!(fs::File::open(file_path))),
    };

    let mut rdr = csv::Reader::from_reader(input);
    let closure = |row: Result<Row, csv::Error>| -> Option<Result<PopulationCount, csv::Error>> {
        let row = match row {
            Ok(row) => row,
            Err(err) => return Some(Err(From::from(err))),
        };
        match row.population {
            None => None,
            Some(count) => if row.city == city {
                Some(Ok(PopulationCount {
                    city: row.city,
                    country: row.country,
                    count: count,
                }))
            } else {
                None
            }
        }
    };
    let found = rdr.decode::<Row>().filter_map(closure);

    if !found.all(|row| match row {
        Ok(_) => true,
        _ => false,
    }) {
        Err(CliError::NotFound)
    } else {
        Ok(found)
    }
}

with the following error from the compiler:

src/main.rs:97:1: 133:2 error: the trait `core::marker::Sized` is not implemented for the type `core::ops::FnMut(core::result::Result<Row, csv::Error>) -> core::option::Option<core::result::Result<PopulationCount, csv::Error>>` [E0277]
src/main.rs:97 fn search<'a, P: AsRef<Path>>(file_path: &Option<P>, city: &str) -> Result<FilterMap<csv::reader::DecodedRecords<'a, Box<Read>, Row>, FnMut(Result<Row, csv::Error>) -> Option<Result<PopulationCount, csv::Error>>>, CliError>  {
src/main.rs:98     let mut found = vec![];
src/main.rs:99     let input: Box<io::Read> = match *file_path {
src/main.rs:100         None => Box::new(io::stdin()),
src/main.rs:101         Some(ref file_path) => Box::new(try!(fs::File::open(file_path))),
src/main.rs:102     };
                ...
src/main.rs:97:1: 133:2 note: `core::ops::FnMut(core::result::Result<Row, csv::Error>) -> core::option::Option<core::result::Result<PopulationCount, csv::Error>>` does not have a constant size known at compile-time
src/main.rs:97 fn search<'a, P: AsRef<Path>>(file_path: &Option<P>, city: &str) -> Result<FilterMap<csv::reader::DecodedRecords<'a, Box<Read>, Row>, FnMut(Result<Row, csv::Error>) -> Option<Result<PopulationCount, csv::Error>>>, CliError>  {
src/main.rs:98     let mut found = vec![];
src/main.rs:99     let input: Box<io::Read> = match *file_path {
src/main.rs:100         None => Box::new(io::stdin()),
src/main.rs:101         Some(ref file_path) => Box::new(try!(fs::File::open(file_path))),
src/main.rs:102     };
                ...
error: aborting due to previous error

I've also tried this function definition:

fn search<'a, P: AsRef<Path>, F>(file_path: &Option<P>, city: &str)
    -> Result<FilterMap<csv::reader::DecodedRecords<'a, Box<Read>, Row>, F>,
              CliError>
    where F:  FnMut(Result<Row, csv::Error>)
                  -> Option<Result<PopulationCount, csv::Error>> {

with these errors from the compiler:

src/main.rs:131:12: 131:17 error: mismatched types:
 expected `core::iter::FilterMap<csv::reader::DecodedRecords<'_, Box<std::io::Read>, Row>, F>`,
 found    `core::iter::FilterMap<csv::reader::DecodedRecords<'_, Box<std::io::Read>, Row>, [closure src/main.rs:105:19: 122:6]>`
(expected type parameter,
found closure) [E0308]
src/main.rs:131         Ok(found)

I can't Box the closure because then it won't be accepted by filter_map.

I then tried this out:

fn search<'a, P: AsRef<Path>>(file_path: &Option<P>, city: &'a str)
    -> Result<(Box<Iterator<Item=Result<PopulationCount, csv::Error>> + 'a>, csv::Reader<Box<io::Read>>), CliError> {
    let input: Box<io::Read> = match *file_path {
        None => box io::stdin(),
        Some(ref file_path) => box try!(fs::File::open(file_path)),
    };

    let mut rdr = csv::Reader::from_reader(input);
    let mut found = rdr.decode::<Row>().filter_map(move |row| {
        let row = match row {
            Ok(row) => row,
            Err(err) => return Some(Err(err)),
        };
        match row.population {
            None => None,
            Some(count) if row.city == city => {
                Some(Ok(PopulationCount {
                    city: row.city,
                    country: row.country,
                    count: count,
                }))
            },
            _ => None,
        }
    });

    if found.size_hint().0 == 0 {
        Err(CliError::NotFound)
    } else {
        Ok((box found, rdr))
    }
}

fn main() {
    let args: Args = Docopt::new(USAGE)
                            .and_then(|d| d.decode())
                            .unwrap_or_else(|err| err.exit());


    match search(&args.arg_data_path, &args.arg_city) {
        Err(CliError::NotFound) if args.flag_quiet => process::exit(1),
        Err(err) => fatal!("{}", err),
        Ok((pops, rdr)) => for pop in pops {
            match pop {
                Err(err) => panic!(err),
                Ok(pop) => println!("{}, {}: {} - {:?}", pop.city, pop.country, pop.count, rdr.byte_offset()),
            }
        }
    }
}

Which gives me this error:

src/main.rs:107:21: 107:24 error: `rdr` does not live long enough
src/main.rs:107     let mut found = rdr.decode::<Row>().filter_map(move |row| {
                                    ^~~
src/main.rs:100:117: 130:2 note: reference must be valid for the lifetime 'a as defined on the block at 100:116...
src/main.rs:100     -> Result<(Box<Iterator<Item=Result<PopulationCount, csv::Error>> + 'a>, csv::Reader<Box<io::Read>>), CliError> {
src/main.rs:101     let input: Box<io::Read> = match *file_path {
src/main.rs:102         None => box io::stdin(),
src/main.rs:103         Some(ref file_path) => box try!(fs::File::open(file_path)),
src/main.rs:104     };
src/main.rs:105     
                ...
src/main.rs:106:51: 130:2 note: ...but borrowed value is only valid for the block suffix following statement 1 at 106:50
src/main.rs:106     let mut rdr = csv::Reader::from_reader(input);
src/main.rs:107     let mut found = rdr.decode::<Row>().filter_map(move |row| {
src/main.rs:108         let row = match row {
src/main.rs:109             Ok(row) => row,
src/main.rs:110             Err(err) => return Some(Err(err)),
src/main.rs:111         };
                ...
error: aborting due to previous error

Have I designed something wrong, or am I taking the wrong approach? Am I missing something really simple and stupid? I'm not sure where to go from here.

回答1:

Returning iterators is possible, but it comes with some restrictions.

To demonstrate it's possible, two examples, (A) with explicit iterator type and (B) using boxing (playpen link).

use std::iter::FilterMap;

fn is_even(elt: i32) -> Option<i32> {
    if elt % 2 == 0 {
        Some(elt)
    } else { None }
}

/// (A)
pub fn evens<I: IntoIterator<Item=i32>>(iter: I)
    -> FilterMap<I::IntoIter, fn(I::Item) -> Option<I::Item>>
{
    iter.into_iter().filter_map(is_even)
}

/// (B)
pub fn cumulative_sums<'a, I>(iter: I) -> Box<Iterator<Item=i32> + 'a>
    where I: IntoIterator<Item=i32>,
          I::IntoIter: 'a,
{
    Box::new(iter.into_iter().scan(0, |acc, x| {
        *acc += x;
        Some(*acc)
    }))
}

fn main() {
    // The output is:
    //  0 is even, 10 is even, 
    //  1, 3, 6, 10, 
    for even in evens(vec![0, 3, 7, 10]) {
        print!("{} is even, ", even);
    }
    println!("");

    for cs in cumulative_sums(1..5) {
        print!("{}, ", cs);
    }
    println!("");
}

You experienced a problem with (A) -- explicit type! Unboxed closures, that we get from regular lambda expressions with |a, b, c| .. syntax, have unique anonymous types. Functions require explicit return types, so that doesn't work here.

Some solutions for returning closures:

  • Use a function pointer fn() as in example (A). Often you don't need a closure environment anyway.
  • Box the closure. This is reasonable, even if the iterators don't support calling it at the moment. Not your fault.
  • Box the iterator
  • Return a custom iterator struct. Requires some boilerplate.

You can see that in example (B) we have to be quite careful with lifetimes. It says that the return value is Box<Iterator<Item=i32> + 'a>, what is this 'a? This is the least lifetime required of anything inside the box! We also put the 'a bound on I::IntoIter -- this ensures we can put that inside the box.

If you just say Box<Iterator<Item=i32>> it will assume 'static.

We have to explicitly declare the lifetimes of the contents of our box. Just to be safe.

This is actually the fundamental problem with your function. You have this: DecodedRecords<'a, Box<Read>, Row>, F>

See that, an 'a! This type borrows something. The problem is it doesn't borrow it from the inputs. There are no 'a on the inputs.

You'll realize that it borrows from a value you create during the function, and that value's lifespan ends when the function returns. We cannot return DecodedRecords<'a> from the function, because it wants to borrow a local variable.

Where to go from here? My easiest answer would be to perform the same split that csv does. One part (Struct or value) that owns the reader, and one part (struct or value) that is the iterator and borrows from the reader.

Maybe the csv crate has an owning decoder that takes ownership of the reader it is processing. In that case you can use that to dispel the borrowing trouble.



回答2:

This answer is based on @bluss's answer + help from #rust on irc.mozilla.org

One issue that's not obvious from the code, and which was causing the final error displayed just above, has to do with the definition of csv::Reader::decode (see its source). It takes &'a mut self, the explanation of this problem is covered in this answer. This essentially causes the lifetime of the reader to be bounded to the block it's called in. The way to fix this is to split the function in half (since I can't control the function definition, as recommended in the previous answer link). I needed a lifetime on the reader that was valid within the main function, so the reader could then be passed down into the search function. See the code below (It could definitely be cleaned up more):

fn population_count<'a, I>(iter: I, city: &'a str)
    -> Box<Iterator<Item=Result<PopulationCount,csv::Error>> + 'a>
    where I: IntoIterator<Item=Result<Row,csv::Error>>,
          I::IntoIter: 'a,
{
    Box::new(iter.into_iter().filter_map(move |row| {
        let row = match row {
            Ok(row) => row,
            Err(err) => return Some(Err(err)),
        };

        match row.population {
            None => None,
            Some(count) if row.city == city => {
                Some(Ok(PopulationCount {
                    city: row.city,
                    country: row.country,
                    count: count,
                }))
            },
            _ => None,
        }
    }))
}

fn get_reader<P: AsRef<Path>>(file_path: &Option<P>)
    -> Result<csv::Reader<Box<io::Read>>, CliError>
{
    let input: Box<io::Read> = match *file_path {
        None => Box::new(io::stdin()),
        Some(ref file_path) => Box::new(try!(fs::File::open(file_path))),
    };

    Ok(csv::Reader::from_reader(input))
}

fn search<'a>(reader: &'a mut csv::Reader<Box<io::Read>>, city: &'a str)
    -> Box<Iterator<Item=Result<PopulationCount, csv::Error>> + 'a>
{
    population_count(reader.decode::<Row>(), city)
}

fn main() {
    let args: Args = Docopt::new(USAGE)
        .and_then(|d| d.decode())
        .unwrap_or_else(|err| err.exit());

    let reader = get_reader(&args.arg_data_path);
    let mut reader = match reader {
        Err(err) => fatal!("{}", err),
        Ok(reader) => reader,
    };

    let populations = search(&mut reader, &args.arg_city);
    let mut found = false;
    for pop in populations {
        found = true;
        match pop {
            Err(err) => fatal!("fatal !! {}", err),
            Ok(pop) => println!("{}, {}: {}", pop.city, pop.country, pop.count),
        }
    }

    if !(found || args.flag_quiet) {
        fatal!("{}", CliError::NotFound);
    }
}

I've learned a lot trying to get this to work, and have much more appreciation for the compiler errors. It's now clear that had this been C, the last error above could actually have caused segfaults, which would have been much harder to debug. I've also realized that converting from a pre-computed vec to an iterator requires more involved thinking about when the memory comes in and out of scope; I can't just change a few function calls and return types and call it a day.



标签: rust