How can I tweak this algorithm to deal with multip

2019-09-17 10:57发布

问题:

I would like to search for all occurrences of a string (first parameter) and have another string (second parameter) be added before all occurrences of the first string.

Ideally, I would like every occurrence of dime to be replaced with limedime. I have managed to do this, however, for only the first occurrence of the word. Any matching string which is not the first is not detected, and nothing is done. Also, multiple lines containing dime get modified based on modifications done on previous lines, which is not what I would like.

Here is some sample output that I get:

something dime something dime something something

will become

something limedime something dime something something

and if I have this

dime
notimportant!
dime
dime

I will get

limedime
notimportant!
limelimedime
limelimelimedime

EDIT: I've modified the code so you can easily test it with stdin, and have also included replace_str():

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

char *replace_str(char *str, char *orig, char *rep)
{
  static char buffer[4096];
  char *p;

  if(!(p = strstr(str, orig)))  
    return str;

  strncpy(buffer, str, p-str); 
  buffer[p-str] = '\0';

  sprintf(buffer+(p-str), "%s%s", rep, p+strlen(orig));

  return buffer;
}

void replace(char* patternoo, char* replacearoo){

    char buff[BUFSIZ];      // the input line
    char newbuff[BUFSIZ];   // the results of any editing

    char pattern[200];
    strcpy(pattern, patternoo);

    char replace[200];
    strcpy(replace, replacearoo);

    while ( fgets( buff, BUFSIZ, stdin ) != NULL ) {
        if ( strstr( buff, pattern ) != NULL ) {
             //THIS IS WHERE WE DO pattern replacing    
                strcpy(newbuff, replace_str(buff, pattern, strcat(replace,pattern)));             

        } else {
              strcpy( newbuff, buff );
        }
          printf("%s", newbuff);
    }
}
int main(){

    replace("dime", "lime");
}

Now, I'm thinking that maybe this way isn't too great, since I'm only looking at lines? I'm not sure what I can do, read every character one by one? It seems a bit much to me, but I am not so sure. Is there any quick and dirty way to fix my current algorithm though? Or would I have to start over and take a whole new approach?

回答1:

Given that you insert lime before each occurrence of dime, you need to read a line, find each occurrence of dime in the input buffer, and when found, copy the unprocessed part of the input buffer to the output buffer, then add lime, then add dime, and then resume your search after the dime.

That translates to:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

static void replace(char *pattern, char *replace)
{
    char buff[BUFSIZ];      // the input line
    char newbuff[BUFSIZ];   // the results of any editing
    size_t replen = strlen(replace);
    size_t patlen = strlen(pattern);

    while (fgets(buff, BUFSIZ, stdin) != NULL)
    {
        newbuff[0] = '\0';
        char *dst = newbuff;
        char *data = buff;
        char *patt;
        while ((patt = strstr(data, pattern)) != NULL)
        {
            memmove(dst, data, (patt - data));
            dst += (patt - data);
            memmove(dst, replace, replen);
            dst += replen;
            memmove(dst, pattern, patlen);
            dst += patlen;
            data = patt + patlen;
        }
        *dst = '\0';
        printf("%s%s", newbuff, data);
    }
}

int main(void)
{
    replace("dime", "lime");
    return 0;
}

The code blithely ignores the existence of over-long expansions of input lines — you need to work to make sure that it does not overflow the output buffer. Since you insert 4 characters (lime) for each dime (4 characters), at worst you need twice as much space in the output as there is in the input. So, changing the size of newbuff[2 * BUFSIZ] would deal with those overflow problems — for the specific strings you're prefixing. Overlong input lines could cause misses too. If a dime was split across the boundary between two buffers full, it would be missed.

Given a file called data (concocted from your question):

something dime something dime something something

    should become

something limedime something limedime something something

    and if I have this

dime
not important!
dime
dime dime
dime dime dime

    I will get limes and dimes galore:

limedime
not important!
limedime
limedime limedime
limedime limedime limedime

The output from running the program (repstr, I called it) is:

$ ./repstr < data
something limedime something limedime something something

    should become

something limelimedime something limelimedime something something

    and if I have this

limedime
not important!
limedime
limedime limedime
limedime limedime limedime

    I will get limes and limedimes galore:

limelimedime
not important!
limelimedime
limelimedime limelimedime
limelimedime limelimedime limelimedime
$


回答2:

one approach may be:

Have a temp string that displays output.

Until whole sentence read
   Read complete word of that sentence.
   if that word == dime
      append limedime to temp string
   else append the same word to temp string.

Dryrun:

input: something dime somthing lime dime

iteration1: something read compare it with lime, they arent equal so append somthing to temp string.
temp: something

iteration2: word read: dime
temp: something limedime

iteration3: word read: something
temp: something limedime something

and soo on.

Hope this approach helps :)
Haven't touched C for ages so i forgot its syntax so can't help with coding pseudo should suffice tho.