Finding the position of a substring in a larger st

2019-01-20 12:29发布

问题:

I have created a function that should find the numerical position of the first character of a substring in a larger string. I am having some problems with the output and I am not too sure why. These problems include -1 being returned every single time instead of the integer position of the substring. I have debugged and cannot trace where the function goes wrong.

This is how the function should perform: If my string is "The dog was fast" and I am searching for the substring "dog", the function should return 4. Thanks to chqrlie for help with the loop.

Here is the function:

int findSubString(char original[], char toFind[]) {

    size_t i, j;
    int originalLength = 0;
    int toFindLength = 0;

    originalLength = strlen(original) + 1;
    toFindLength = strlen(toFind) + 1;

    for (i = 0; i < toFindLength + 1; i++) {
        for (j = 0; j < originalLength + 1; j++) {
            if (toFind[j] == '\0') {
                return i;
            }
            if (original[i + j] != toFind[j]) {
                break;
            }
        }
        if (original[i] == '\0') {
            return -1;
        }
    }
}

The function parameters cannot be modified, this is a requirement. Any help appreciated!

回答1:

These statements inside the loops

       if (toFind[j] == '\0') {
            return i;
        }

results in undefined behavior because the string toFind can be shorter than the string original.

The same is valid for this loop

        if (original[i + j] != toFind[j]) {
            break;
        }

because i + j can be greater than the length of the string original.

And there is no need to scan all characters of the string original if you are going to find a substring inside it.

Also you should check whether the length of the string original is not less than the length of the string toFind.

If you want to find only the first character of the string toFind in the string original it is enough to use standard C function strchr. If you want to find the whole string toFind in the string original then you could use another C standard function strstr.

If you want to write the function yourself to find a string in other string then it can look for example the following way

I declared the function like

long long int findSubString( const char original[], const char toFind[] );

however you can write its declaration as you like for example like

int findSubString( char original[], char toFind[] );

But in this case you should declare function local variable success like

int success = -1;

and output the result using format specifier "%d" instead of "%lld".

Here you are.

#include <stdio.h>
#include <string.h>
#include <stddef.h>

long long int findSubString( const char original[], const char toFind[] )
{
    size_t n = strlen( original );
    size_t m = strlen( toFind );

    long long int success = -1;

    if ( !( n < m ) )
    {
        n = n - m + 1;

        for ( size_t i = 0; success == -1 && i < n; i++ )
        {
            size_t j = 0;
            while ( j < m && original[i+j] == toFind[j] ) j++;

            if ( j == m ) success = i;
        }
    }

    return success;
}

int main(void) 
{
    printf( "%lld\n", findSubString( "The dog was fast", "dog" ) );

    return 0;
}

Its output is

4


回答2:

Your loops are reversed. The outer loop should walk positions from zero to originalLength, inclusive; the nested loop should walk positions from zero to toFindLength, inclusive.

Both originalLength and toFindLength should be set to values returned by strlen, not strlen plus one, because null terminator position is not a good start.

Finally, you are returning -1 from inside the outer loop. This is too early - you should be returning -1 only after you are done with the outer loop as well.



回答3:

Your loop counter tests are incorrect: wrong upper limit and the limits are off by one. Note that the tests are actually not necessary as you exit both loops when hitting the '\0' terminators.

Here is a simpler version:

int findSubString(const char *original, const char *toFind) {
    for (size_t i = 0;; i++) {
        for (size_t j = 0;; j++) {
            if (toFind[j] == '\0') {
                return i;
            }
            if (original[i + j] != toFind[j]) {
                break;
            }
        }
        if (original[i] == '\0') {
            return -1;
        }
    }
}

There is a small advantage at computing the string lengths to reduce the number of comparisons in pathological cases such as findSubString("aaaaaaaaaaa", "aaaaaaaaaaaa");

int findSubString(const char *original, const char *toFind) {
    size_t originalLength = strlen(original);
    size_t toFindLength = strlen(toFind);

    if (toFindLength <= originalLength) {
        for (size_t i = 0; i <= originalLength - toFindLength; i++) {
            for (size_t j = 0;; j++) {
                if (toFind[j] == '\0') {
                    return i;
                }
                if (original[i + j] != toFind[j]) {
                    break;
                }
            }
        }
    }
    return -1;
}