I was assigned by my teacher to write my own strcmp()
function in C. I did create my own version of said function, and I was hoping to get some feedback.
int CompareTwoStrings ( char *StringOne, char *StringTwo ) {
// Evaluates if both strings have the same length.
if ( strlen ( StringOne ) != strlen ( StringTwo ) ) {
// Given that the strings have an unequal length, it compares between both
// lengths.
if ( strlen ( StringOne ) < strlen ( StringTwo ) ) {
return ( StringOneIsLesser );
}
if ( strlen ( StringOne ) > strlen ( StringTwo ) ) {
return ( StringOneIsGreater );
}
}
int i;
// Since both strings are equal in length...
for ( i = 0; i < strlen ( StringOne ); i++ ) {
// It goes comparing letter per letter.
if ( StringOne [ i ] != StringTwo [ i ] ) {
if ( StringOne [ i ] < StringTwo [ i ] ) {
return ( StringOneIsLesser );
}
if ( StringOne [ i ] > StringTwo [ i ] ) {
return ( StringOneIsGreater );
}
}
}
// If it ever reaches this part, it means they are equal.
return ( StringsAreEqual );
}
StringOneIsLesser, StringOneIsGreater, StringsAreEqual are defined as const int with the respective values: -1, +1, 0.
Thing is, I'm not exactly sure if, for example, my StringOne has a lesser length than my StringTwo, that automatically means StringTwo is greater, because I don't know how strcmp()
is particularly implemented. I need some of your feedback for that.
So much for such a simple task. I believe something simple as this would do:
int my_strcmp(char *a, char *b)
{
while (*a && *b && *a == *b) { ++a; ++b; }
return (unsigned char)(*a) - (unsigned char)(*b);
}
strcmp
compares alphabetically: so "aaa" < "b"
even though "b" is shorter.
Because of this, you can skip the length check and just do the letter by letter comparison. If you get to a NULL character while both strings are equal so far, then the shorter one is the lesser one.
Also: make StringsAreEqual == 0
, not 1
for compatibility with standard sorting functions.
Try this also for your better understanding:
#include <stdio.h>
#include <string.h>
int main(void)
{
char string1[20], string2[20];
int i=0,len=0, count=0;
puts("enter the stirng one to compare");
fgets(string1, sizeof(string1), stdin);
len = strlen(string1);
if(string1[len-1]=='\n')
string1[len-1]='\0';
puts("enter the stirng two to compare");
fgets(string2, sizeof(string2), stdin);
len = strlen(string2);
if(string2[len-1]=='\n')
string2[len-1]='\0';
if(strlen(string1)==strlen(string2))
{
for(i=0;string1[i]!='\0', string2[i]!='\0', i<strlen(string1);i++)
{
count=string1[i]-string2[i];
count+=count;
}
if(count==0)
printf("strings are equal");
else if(count<0)
printf("string1 is less than string2");
else if(count>0)
printf("string2 is less than string1");
}
if(strlen(string1)<strlen(string2))
{
for(i=0;string1[i]!='\0', i<strlen(string1);i++)
{
count=string1[i]-string2[i];
count+=count;
}
if(count==0)
printf("strings are equal");
else if(count<0)
printf("string1 is less than string2");
else if(count>0)
printf("string2 is less than string1");
}
if(strlen(string1)>strlen(string2))
{
for(i=0;string2[i]!='\0', i<strlen(string2);i++)
{
count=string1[i]-string2[i];
count+=count;
}
if(count==0)
printf("strings are equal");
else if(count<0)
printf("string1 is less than string2");
else if(count>0)
printf("string2 is less than string1");
}
return 0;
}
int mystrncmp(const char * str1, const char * str2, unsigned int n)
{
while (*str1 == *str2) {
if (*str1 == '\0' || *str2 == '\0')
break;
str1++;
str2++;
}
if (*str1 == '\0' && *str2 == '\0')
return 0;
else
return -1;
}
strcmp()
is fairly easy to code. The usual mis-codings issues include:
Parameter type
strcmp(s1,s2)
uses const char *
types, not char *
. This allows the function to be called with pointers to const
data. It conveys to the user the function's non-altering of data. It can help with optimization.
Sign-less compare
All str...()
function perform as if char
was unsigned char
, even if char
is signed. This readily affects the result when strings differ and a character outside the range [1...CHAR_MAX]
is found.
Range
On select implementations, the range of unsigned char
minus unsigned char
is outside the int
range. Using 2 compares (a>b) - (a-b)
avoids any problem rather than a-b;
. Further: many compilers recognized that idiom and emit good code.
int my_strcmp(const char *s1, const char *s2) {
// All compares done as if `char` was `unsigned char`
const unsigned char *us1 = (const unsigned char *) s1;
const unsigned char *us2 = (const unsigned char *) s2;
// As long as the data is the same and '\0' not found, iterate
while (*us1 == *us2 && *us1 != '\0') {
us1++;
us2++;
}
// Use compares to avoid any mathematical overflow
// (possible when `unsigned char` and `unsigned` have the same range).
return (*us1 > *us2) - (*us1 < *us2);
}