Efficient trimming of a String

2019-02-16 23:11发布

问题:

I have a file in the csv format with a first column of data that represents item code optionally ended with "UNIUNI" or mixed case of these chars, loaded by means of a barcode reader. I need to trim away the last "UNI"s.

In Rust I tried to write with partial success a function essentially like this:

fn main() {
    // Ok: from "9846UNIUNI" to "9846"
    println!("{}", read_csv_rilev("9846UNIUNI".to_string()));

    // Wrong: from "9846uniuni" to "9846"
    println!("{}", read_csv_rilev("9846uniuni".to_string()));
}

fn read_csv_rilev(code: String) -> String {
    code
        //.to_uppercase() /*Unstable feature in Rust 1.1*/
        .trim_right_matches("UNI")
        .to_string()
}

The ideal function signature looks like:

fn read_csv_rilev(mut s: &String) {/**/}

but probably an in-place action on a String is not a good idea. In fact, in the Rust standard library there isn't anything to do this excluding String::pop().

Is there a way to apply the trimming on a String without to allocate another one?

回答1:

a way to apply the trimming on a String without to allocate another one?

Yes, using truncate:

const TRAILER: &'static str = "UNI";

fn read_csv_rilev(s: &mut String) {
    while s.ends_with(TRAILER) {
        let len = s.len();
        let new_len = len.saturating_sub(TRAILER.len());
        s.truncate(new_len);
    }
}

fn main() {
    let mut code = "Hello WorldUNIUNIUNI".into();

    read_csv_rilev(&mut code);

    println!("{}", code);
}

Of course, you don't need to mess with the allocated string at all. You can use the same logic and make successive subslices of the string. This is basically how trim_right_matches works, but a bit less generic:

const TRAILER: &'static str = "UNI";

fn read_csv_rilev(mut s: &str) -> &str {
    while s.ends_with(TRAILER) {
        let len = s.len();
        let new_len = len.saturating_sub(TRAILER.len());
        s = &s[..new_len];
    }
    s
}

fn main() {
    let code = "Hello WorldUNIUNIUNI";

    let truncated = read_csv_rilev(code);

    println!("{}", truncated);
}

In general, I'd probably go with the second solution.



回答2:

but probably an in-place action on a String is not a good idea.

The binding is mutable in mut s: &String, not the string itself. You would take s: &mut String if you wanted to mutate the string itself.

That said, I don't think there's anything in the standard library to do this.



回答3:

Another solution is to use the owning_ref crate, which lets you return both a &str and its backing String at the same time:

extern crate owning_ref;
use owning_ref::StringRef;

fn read_csv_rilev(code: String) -> StringRef {
    StringRef::new(code).map(|s| s.trim_right_matches("UNI"))
}