Remove extra white space from inside a C string?

2019-02-14 09:46发布

问题:

I have read a few lines of text into an array of C-strings. The lines have an arbitrary number of tab or space-delimited columns, and I am trying to figure out how to remove all the extra whitespace between them. The end goal is to use strtok to break up the columns. This is a good example of the columns:

Cartwright   Wendy    93
Williamson   Mark     81
Thompson     Mark     100
Anderson     John     76
Turner       Dennis   56

How can I eliminate all but one of the spaces or tabs between the columns so the output looks like this?

Cartwright Wendy 93

Alternatively, can I just replace all of the whitespace between the columns with a different character in order to use strtok? Something like this?

Cartwright#Wendy#93

edit: Multiple great answers, but had to pick one. Thanks for the help all.

回答1:

If I may voice the "you're doing it wrong" opinion, why not just eliminate the whitespace while reading? Use fscanf("%s", string); to read a "word" (non whitespace), then read the whitespace. If it's spaces or tabs, keep reading into one "line" of data. If it's a newline, start a new entry. It's probably easiest in C to get the data into a format you can work with as soon as possible, rather than trying to do heavy-duty text manipulation.



回答2:

Why not use strtok() directly? No need to modify the input

All you need to do is repeat strtok() until you get 3 non-space tokens and then you are done!



回答3:

Edit: I originally had a malloced workspace, which I though might be clearer. However, doing it w/o extra memory is almost as simple, and I'm being pushed that way in comments and personal IMs, so, here comes...:-)

void squeezespaces(char* row, char separator) {
  char *current = row;
  int spacing = 0;
  int i;

  for(i=0; row[i]; ++i) {
    if(row[i]==' ') {
      if (!spacing) {
        /* start of a run of spaces -> separator */
        *current++ = separator
        spacing = 1;
      }
    } else {
      *current++ = row[i];
      spacing = 0;
  }
  *current = 0;    
}


回答4:

The following code modifies the string in place; if you don't want to destroy your original input, you can pass a second buffer to receive the modified string. Should be fairly self-explanatory:

#include <stdio.h>
#include <string.h>

char *squeeze(char *str)
{
  int r; /* next character to be read */
  int w; /* next character to be written */

  r=w=0;
  while (str[r])
  {
    if (isspace(str[r]) || iscntrl(str[r]))
    {
      if (w > 0 && !isspace(str[w-1]))
        str[w++] = ' ';
    }
    else
      str[w++] = str[r];
    r++;
  }
  str[w] = 0;
  return str;
}

int main(void)
{
  char test[] = "\t\nThis\nis\ta\b     test.";
  printf("test = %s\n", test);
  printf("squeeze(test) = %s\n", squeeze(test));
  return 0;
}


回答5:

char* trimwhitespace(char *str_base) {
    char* buffer = str_base;
    while((buffer = strchr(str_base, ' '))) {
        strcpy(buffer, buffer+1);
    }

    return str_base;
}


回答6:

You could read a line then scan it to find the start of each column. Then use the column data however you'd like.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX_COL 3
#define MAX_REC 512

int main (void)
{
    FILE *input;
    char record[MAX_REC + 1];
    char *scan;
    const char *recEnd;
    char *columns[MAX_COL] = { 0 };
    int colCnt;

    input = fopen("input.txt", "r");

    while (fgets(record, sizeof(record), input) != NULL)
    {
        memset(columns, 0, sizeof(columns));  // reset column start pointers

        scan = record;
        recEnd = record + strlen(record);

        for (colCnt = 0; colCnt < MAX_COL; colCnt++ )
        {
          while (scan < recEnd && isspace(*scan)) { scan++; }  // bypass whitespace
          if (scan == recEnd) { break; }
          columns[colCnt] = scan;  // save column start
          while (scan < recEnd && !isspace(*scan)) { scan++; }  // bypass column word
          *scan++ = '\0';
        }

        if (colCnt > 0)
        {
            printf("%s", columns[0]);
            for (int i = 1; i < colCnt; i++)
            {
             printf("#%s", columns[i]);
            }
            printf("\n");
        }
    }

    fclose(input);
}

Note, the code could still use some robust-ification: check for file errors w/ferror; ensure eof was hit w/feof; ensure entire record (all column data) was processed. It could also be made more flexible by using a linked list instead of a fixed array and could be modified to not assume each column only contains a single word (as long as the columns are delimited by a specific character).



回答7:

Here's an alternative function that squeezes out repeated space characters, as defined by isspace() in <ctype.h>. It returns the length of the 'squidged' string.

#include <ctype.h>

size_t squidge(char *str)
{
    char *dst = str;
    char *src = str;
    char  c;
    while ((c = *src++) != '\0')
    {
        if (isspace(c))
        {
            *dst++ = ' ';
            while ((c = *src++) != '\0' && isspace(c))
                ;
            if (c == '\0')
                break;
        }
        *dst++ = c;
    }
    *dst = '\0';
    return(dst - str);
}

#include <stdio.h>
#include <string.h>

int main(void)
{
    char buffer[256];
    while (fgets(buffer, sizeof(buffer), stdin) != 0)
    {
        size_t len = strlen(buffer);
        if (len > 0)
            buffer[--len] = '\0';
        printf("Before: %zd <<%s>>\n", len, buffer);
        len = squidge(buffer);
        printf("After:  %zd <<%s>>\n", len, buffer);
    }
    return(0);
}


回答8:

I made a small improvment over John Bode's to remove trailing whitespace as well:

#include <ctype.h>

char *squeeze(char *str)
{
  char* r; /* next character to be read */
  char* w; /* next character to be written */
  char c;
  int sp, sp_old = 0;

  r=w=str;

  do {
    c=*r;
    sp = isspace(c);
    if (!sp) {
      if (sp_old && c) {
        // don't add a space at end of string
        *w++ = ' ';
      }
      *w++ = c;
    }
    if (str < w) {
      // don't add space at start of line
      sp_old = sp;
    }
    r++;
  }
  while (c);

  return str;
}

#include <stdio.h>

int main(void)
{
  char test[] = "\t\nThis\nis\ta\f     test.\n\t\n";
  //printf("test = %s\n", test);
  printf("squeeze(test) = '%s'\n", squeeze(test));
  return 0;
}

br.



回答9:

The following code simply takes input character wise, then check for each character if there is space more than once it skips it else it prints the character. Same logic you can use for tab also. Hope it helps in solving your problem. If there is any problem with this code please let me know.

    int c, count = 0;
    printf ("Please enter your sentence\n");
    while ( ( c = getchar() ) != EOF )  {
        if ( c != ' ' )  {
            putchar ( c );
            count = 0;
        }
        else  {
            count ++;
            if ( count > 1 )
                ;    /* Empty if body */
            else
                putchar ( c );
         }
     }
}