Regex using Vala and GLib

2019-03-04 01:48发布

问题:

Is there a function, something like http://php.net/manual/en/function.preg-match-all.php ?

Using GLib http://references.valadoc.org/#!api=glib-2.0/GLib.MatchInfo, all i'v found is :

public bool match_all_full (string str, ssize_t string_len = -1, int start_position = 0,  RegexMatchFlags match_options = 0, out MatchInfo match_info = null) throws RegexError
Using the standard algorithm for regular expression matching only the longest match in the string is retrieved, it is not possible to obtain all the available matches. 

and it says it is not possible to obtain all the available matches.

I wasn't able to find any working code sample. Thanks for your help.

Note:

the objective is to parse a plist file (i only need CFBundleIdentifier and CFBundleName values)

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
    <key>CFBundleIdentifier</key>
    <string>nodejs</string>
    <key>CFBundleName</key>
    <string>Node.js</string>
    <key>DocSetPlatformFamily</key>
    <string>nodejs</string>
    <key>isDashDocset</key><true/><key>dashIndexFilePath</key><string>nodejs    /api/documentation.html</string></dict>
</plist>

I have these dependencies availabes (the ubuntu synapse package):

Build-Depends: debhelper (>= 9),
dh-autoreconf,
gnome-common,
valac (>= 0.16.0),
libzeitgeist-2.0-dev (>= 0.9.14),
libdbus-glib-1-dev,
libgtk-3-dev (>= 3.0.0),
libglib2.0-dev (>= 2.28.0),
libgee-0.8-dev (>= 0.5.2),
libjson-glib-dev (>= 0.10.0),
libkeybinder-3.0-dev,
libnotify-dev,
librest-dev,
libappindicator3-dev (>= 0.0.7)

As a results it gives me

** Message: main.vala:28: CFBundleIdentifier: cakephp
** Message: main.vala:28: CFBundleName: CakePHP
** Message: main.vala:28: DocSetPlatformFamily: cakephp

To the question why not using xmllib ? The project has few dependecies, in a GNU system (despite i'm a newbie), programs are packaged assuming only certains dependency, if i wan"t my plugin to be used, i think i have to use only the available dependencies or i might broke something et block the update for the endsudoer.

回答1:

First, lets take a look at some of the context around that quote you cited, with emphasis added:

Using the standard algorithm for regular expression matching only the longest match in the string is retrieved, it is not possible to obtain all the available matches. For instance matching "<a> <b> <c>" against the pattern "<.*>" you get "<a> <b> <c>".

This function uses a different algorithm (called DFA, i.e. deterministic finite automaton), so it can retrieve all the possible matches, all starting at the same point in the string. For instance matching "<a> <b> <c>" against the pattern "<.*>;" you would obtain three matches: "<a> <b> <c>", "<a> <b>" and "<a>".

That said, that isn't the "all" you are looking for—your case is much simpler. All you need to do is iterate through the matches from the standard algorithm:

private static int main (string[] args) {
    string contents;
    GLib.Regex exp = /\<key\>([a-zA-Z0-9]+)\<\/key\>[\n\t ]*\<string\>([a-zA-Z0-9\.]+)\<\/string\>/;

    assert (args.length > 1);
    try {
        GLib.FileUtils.get_contents (args[1], out contents, null);
    } catch (GLib.Error e) {
        GLib.error ("Unable to read file: %s", e.message);
    }

    try {
        GLib.MatchInfo mi;
        for (exp.match (contents, 0, out mi) ; mi.matches () ; mi.next ()) {
            GLib.message ("%s: %s", mi.fetch (1), mi.fetch (2));
        }
    } catch (GLib.Error e) {
        GLib.error ("Regex failed: %s", e.message);
    }

    return 0;
}