Parsing text in C

2020-07-12 10:40发布

I have a file like this:

...
words 13
more words 21
even more words 4
...

(General format is a string of non-digits, then a space, then any number of digits and a newline)

and I'd like to parse every line, putting the words into one field of the structure, and the number into the other. Right now I am using an ugly hack of reading the line while the chars are not numbers, then reading the rest. I believe there's a clearer way.

标签: c parsing
7条回答
女痞
2楼-- · 2020-07-12 11:18

You could try using strtok() to tokenize each line, and then check whether each token is a number or a word (a fairly trivial check once you have the token string - just look at the first character of the token).

查看更多
来,给爷笑一个
3楼-- · 2020-07-12 11:19
fscanf(file, "%s %d", word, &value);

This gets the values directly into a string and an integer, and copes with variations in whitespace and numerical formats, etc.

Edit

Ooops, I forgot that you had spaces between the words. In that case, I'd do the following. (Note that it truncates the original text in 'line')

// Scan to find the last space in the line
char *p = line;
char *lastSpace = null;
while(*p != '\0')
{
    if (*p == ' ')
        lastSpace = p;
    p++;
}


if (lastSpace == null)
    return("parse error");

// Replace the last space in the line with a NUL
*lastSpace = '\0';

// Advance past the NUL to the first character of the number field
lastSpace++;

char *word = text;
int number = atoi(lastSpace);

You can solve this using stdlib functions, but the above is likely to be more efficient as you're only searching for the characters you are interested in.

查看更多
We Are One
4楼-- · 2020-07-12 11:24

Edit: You can use pNum-buf to get the length of the alphabetical part of the string, and use strncpy() to copy that into another buffer. Be sure to add a '\0' to the end of the destination buffer. I would insert this code before the pNum++.

int len = pNum-buf;
strncpy(newBuf, buf, len-1);
newBuf[len] = '\0';

You could read the entire line into a buffer and then use:

char *pNum;
if (pNum = strrchr(buf, ' ')) {
  pNum++;
}

to get a pointer to the number field.

查看更多
The star\"
5楼-- · 2020-07-12 11:27

Given the description, here's what I'd do: read each line as a single string using fgets() (making sure the target buffer is large enough), then split the line using strtok(). To determine if each token is a word or a number, I'd use strtol() to attempt the conversion and check the error condition. Example:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * Read the next line from the file, splitting the tokens into 
 * multiple strings and a single integer. Assumes input lines
 * never exceed MAX_LINE_LENGTH and each individual string never
 * exceeds MAX_STR_SIZE.  Otherwise things get a little more
 * interesting.  Also assumes that the integer is the last 
 * thing on each line.  
 */
int getNextLine(FILE *in, char (*strs)[MAX_STR_SIZE], int *numStrings, int *value)
{
  char buffer[MAX_LINE_LENGTH];
  int rval = 1;
  if (fgets(buffer, buffer, sizeof buffer))
  {
    char *token = strtok(buffer, " ");
    *numStrings = 0;
    while (token) 
    {
      char *chk;
      *value = (int) strtol(token, &chk, 10);
      if (*chk != 0 && *chk != '\n')
      {
        strcpy(strs[(*numStrings)++], token);
      }
      token = strtok(NULL, " ");
    }
  }
  else
  {
    /** 
     * fgets() hit either EOF or error; either way return 0
     */
    rval = 0;
  }
  return rval;
}
/**
 * sample main
 */
int main(void)
{
  FILE *input;
  char strings[MAX_NUM_STRINGS][MAX_STRING_LENGTH];
  int numStrings;
  int value;

  input = fopen("datafile.txt", "r");
  if (input)
  {
    while (getNextLine(input, &strings, &numStrings, &value))
    {
      /**
       * Do something with strings and value here
       */
    }
    fclose(input);
  }
  return 0;
}
查看更多
在下西门庆
6楼-- · 2020-07-12 11:28

Assuming that the number is immediately followed by '\n'. you can read each line to chars buffer, use sscanf("%d") on the entire line to get the number, and then calculate the number of chars that this number takes at the end of the text string.

查看更多
欢心
7楼-- · 2020-07-12 11:28

Given the description, I think I'd use a variant of this (now tested) C99 code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

struct word_number
{
    char word[128];
    long number;
};

int read_word_number(FILE *fp, struct word_number *wnp)
{
    char buffer[140];
    if (fgets(buffer, sizeof(buffer), fp) == 0)
        return EOF;
    size_t len = strlen(buffer);
    if (buffer[len-1] != '\n')  // Error if line too long to fit
        return EOF;
    buffer[--len] = '\0';
    char *num = &buffer[len-1];
    while (num > buffer && !isspace(*num))
        num--;
    if (num == buffer)         // No space in input data
        return EOF;
    char *end;
    wnp->number = strtol(num+1, &end, 0);
    if (*end != '\0')  // Invalid number as last word on line
        return EOF;
    *num = '\0';
    if (num - buffer >= sizeof(wnp->word))  // Non-number part too long
        return EOF;
    memcpy(wnp->word, buffer, num - buffer);
    return(0);
}

int main(void)
{
    struct word_number wn;
    while (read_word_number(stdin, &wn) != EOF)
        printf("Word <<%s>> Number %ld\n", wn.word, wn.number);
    return(0);
}

You could improve the error reporting by returning different values for different problems. You could make it work with dynamically allocated memory for the word portion of the lines. You could make it work with longer lines than I allow. You could scan backwards over digits instead of non-spaces - but this allows the user to write "abc 0x123" and the hex value is handled correctly. You might prefer to ensure there are no digits in the word part; this code does not care.

查看更多
登录 后发表回答