Parsing camel case strings with nom

2019-07-22 02:38发布

问题:

I want to parse a string like "ParseThis" or "parseThis" into a vector of strings like ["Parse", "This"] or ["parse", "this"] using the nom crate.

All attempts I've tried do not return the expected result. It's possible that I don't understand yet how to use all the functions in nom.

I tried:

named!(camel_case<(&str)>, 
       map_res!(
           take_till!(is_not_uppercase),
           std::str::from_utf8));

named!(p_camel_case<&[u8], Vec<&str>>,
       many0!(camel_case));

But p_camel_case just returns a Error(Many0) for parsing a string that starts with an uppercase letter and for parsing a string that starts with a lowercase letter it returns Done but with an empty string as a result.

How can I tell nom that I want to parse the string, separated by uppercase letters (given there can be a first uppercase or lowercase letter)?

回答1:

You are looking for things that start with any character, followed by a number of non-uppercase letters. As a regex, that would look akin to .[a-z]*. Translated directly to nom, that's something like:

#[macro_use]
extern crate nom;

use nom::anychar;

fn is_uppercase(a: u8) -> bool { (a as char).is_uppercase() }

named!(char_and_more_char<()>, do_parse!(
    anychar >>
    take_till!(is_uppercase) >>
    ()
));

named!(camel_case<(&str)>, map_res!(recognize!(char_and_more_char), std::str::from_utf8));

named!(p_camel_case<&[u8], Vec<&str>>, many0!(camel_case));

fn main() {
    println!("{:?}", p_camel_case(b"helloWorld"));
    // Done([], ["hello", "World"])

    println!("{:?}", p_camel_case(b"HelloWorld"));
    // Done([], ["Hello", "World"])
}

Of course, you probably need to be careful about actually matching proper non-ASCII bytes, but you should be able to extend this in a straight-forward manner.



标签: parsing rust nom