Parsing command line statements as a list of token

2019-08-12 15:26发布

#include <stdio.h>
#include <string.h> /* needed for strtok */
#include <unistd.h>
#include <stdlib.h>

int main(int argc, char **argv) {
        char text[10000];
    fgets(text, sizeof(text), stdin);
    char *t;
    int i;

    t = strtok(text, "\"\'| ");
    for (i=0; t != NULL; i++) {
        printf("token %d is \"%s\"\n", i, t);
        t = strtok(NULL, "\"\'| ");
    }
}

This is part of the code that im trying to make it is supposed to separate tokens

Let's say the input is 'abc' "de f'g" hij| k "lm | no"

The output should be

token 1: "abc"
token 2: "de f'g"
token 3: "hij"
token 4: "|"
token 5: "k"
token 6: "lm | no"

I get something different but close anyway I can change it to this format?

2条回答
手持菜刀,她持情操
2楼-- · 2019-08-12 15:53
#include <stdio.h>
#include <string.h>

char *getToken(char **sp){
    static const char *sep = " \t\n";
    static char vb[] = "|", vbf;
    char *p, *s;
    if(vbf){
        vbf = 0;
        return vb;
    }
    if (sp == NULL || *sp == NULL || **sp == '\0') return(NULL);
    s = *sp;
    if(*s == '"')
        p = strchr(++s, '"');
    else if(*s == '\'')
        p = strchr(++s, '\'');
    else
        p = s + strcspn(s, "| \t\n");
    if(*p != '\0'){
        if(*p == '|'){
            *vb = vbf = '|';
        }
        *p++ = '\0';
        p += strspn(p, sep);
    }
    *sp = p;
    if(!*s){
        vbf = 0;
        return vb;
    }
    return s;
}

int main(int argc, char **argv) {
    char text[10000];
    fgets(text, sizeof(text), stdin);
    char *t, *p = text;
    int i;

    t = getToken(&p);
    for (i=1; t != NULL; i++) {
        printf("token %d is \"%s\"\n", i, t);
        t = getToken(&p);
    }
    return 0;
}
查看更多
该账号已被封号
3楼-- · 2019-08-12 15:58

What you're trying to do is essentially a parser. strtok isn't a very good tool for this, and you may have better luck writing your own. strtok works on the presumption that whatever delimits your tokens is unimportant and so can be overwritten with '\0'. But you DO care what the delimiter is.

The only problem you'll have is that | syntax. The fact that you want to use it as a token delimiter and a token is likely to make your code more complicated (but not too much). Here, you have the issue that hij is followed immediately by |. If you terminate hij to get the token, you will have to overwrite the |. You either have to store the overwritten character and restore it, or copy the string out somewhere else.

You basically have three cases:

  • The bar | is a special delimiter that is also a token;
  • Quoted delimiters " and ' match everything until the next quote of the same kind;
  • Otherwise, tokens are delimited by whitespace.
查看更多
登录 后发表回答