How does strtok() split the string into tokens in

2018-12-31 21:29发布

Please explain me the working of strtok() function.The manual says it breaks the string into tokens. I am unable to understand from the manual what actually it does.

I added watches on str and *pch to check its working, when the first while loop occurred, the contents of str were only "this". How did the output shown below printed on the screen?

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

Output:

Splitting string "- This, a sample string." into tokens:
This
a
sample
string

13条回答
墨雨无痕
2楼-- · 2018-12-31 22:21

For those who are still having hard time understanding this strtok() function, take a look at this pythontutor example, it is a great tool to visualize your C (or C++, Python ...) code.

In case the link got broken, paste in:

#include <stdio.h>
#include <string.h>

int main()
{
    char s[] = "Hello, my name is? Matthew! Hey.";
    char* p;
    for (char *p = strtok(s," ,?!."); p != NULL; p = strtok(NULL, " ,?!.")) {
      puts(p);
    }
    return 0;
}

Credits go to Anders K.

查看更多
流年柔荑漫光年
3楼-- · 2018-12-31 22:22

strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.

查看更多
深知你不懂我心
4楼-- · 2018-12-31 22:23

strtok doesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)

From the POSIX strtok page:

This function uses static storage to keep track of the current string position between calls.

There is a thread-safe variant (strtok_r) that doesn't do this type of magic.

查看更多
美炸的是我
5楼-- · 2018-12-31 22:26

To understand how strtok() works, one first need to know what a static variable is. This link explains it quite well....

The key to the operation of strtok() is preserving the location of the last seperator between seccessive calls (that's why strtok() continues to parse the very original string that is passed to it when it is invoked with a null pointer in successive calls)..

Have a look at my own strtok() implementation, called zStrtok(), which has a sligtly different functionality than the one provided by strtok()

char *zStrtok(char *str, const char *delim) {
    static char *static_str=0;      /* var to store last address */
    int index=0, strlength=0;           /* integers for indexes */
    int found = 0;                  /* check if delim is found */

    /* delimiter cannot be NULL
    * if no more char left, return NULL as well
    */
    if (delim==0 || (str == 0 && static_str == 0))
        return 0;

    if (str == 0)
        str = static_str;

    /* get length of string */
    while(str[strlength])
        strlength++;

    /* find the first occurance of delim */
    for (index=0;index<strlength;index++)
        if (str[index]==delim[0]) {
            found=1;
            break;
        }

    /* if delim is not contained in str, return str */
    if (!found) {
        static_str = 0;
        return str;
    }

    /* check for consecutive delimiters
    *if first char is delim, return delim
    */
    if (str[0]==delim[0]) {
        static_str = (str + 1);
        return (char *)delim;
    }

    /* terminate the string
    * this assignmetn requires char[], so str has to
    * be char[] rather than *char
    */
    str[index] = '\0';

    /* save the rest of the string */
    if ((str + index + 1)!=0)
        static_str = (str + index + 1);
    else
        static_str = 0;

        return str;
}

And here is an example usage

  Example Usage
      char str[] = "A,B,,,C";
      printf("1 %s\n",zStrtok(s,","));
      printf("2 %s\n",zStrtok(NULL,","));
      printf("3 %s\n",zStrtok(NULL,","));
      printf("4 %s\n",zStrtok(NULL,","));
      printf("5 %s\n",zStrtok(NULL,","));
      printf("6 %s\n",zStrtok(NULL,","));

  Example Output
      1 A
      2 B
      3 ,
      4 ,
      5 C
      6 (null)

The code is from a string processing library I maintain on Github, called zString. Have a look at the code, or even contribute :) https://github.com/fnoyanisi/zString

查看更多
萌妹纸的霸气范
6楼-- · 2018-12-31 22:27

the strtok runtime function works like this

the first time you call strtok you provide a string that you want to tokenize

char s[] = "this is a string";

in the above string space seems to be a good delimiter between words so lets use that:

char* p = strtok(s, " ");

what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)

in order to get next token and to continue with the same string NULL is passed as first argument since strtok maintains a static pointer to your previous passed string:

p = strtok(NULL," ");

p now points to 'is'

and so on until no more spaces can be found, then the last string is returned as the last token 'string'.

more conveniently you could write it like this instead to print out all tokens:

for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
  puts(p);
}

EDIT:

If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.

查看更多
与风俱净
7楼-- · 2018-12-31 22:27

strtok maintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.

This is the reason strtok isn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.

查看更多
登录 后发表回答