I need a c language code to sort some strings and it should be case sensitive and for the same letter in upper- and lower-cases, the lower-case must come first. For example the result of the sort for the following strings:
eggs
bacon
cheese
Milk
spinach
potatoes
milk
spaghetti
should be:
bacon
cheese
eggs
milk
Milk
potatoes
spaghetti
spinach
I have written a code but the result that I am getting is:
Milk
bacon
cheese
eggs
milk
potatoes
spaghetti
spinach
I have no idea how to improve this and I have searched a lot. Could anyone help me with this?
#include <stdio.h>
#include <string.h>
int main(){
char c;
char name[20][10], temp[10];
int count_name = 0;
int name_index = 0;
int i, j;
while ((c = getchar()) != EOF){
if (c == 10){
name[count_name][name_index] = '\0';
count_name++;
name_index = 0;
} else {
name[count_name][name_index] = c;
name_index++;
}
}
for(i=0; i < count_name-1 ; i++){
for(j=i+1; j< count_name; j++)
{
if(strcmp(name[i],name[j]) > 0)
{
strcpy(temp,name[i]);
strcpy(name[i],name[j]);
strcpy(name[j],temp);
}
}
}
for (i = 0; i < count_name; i++){
printf("%s\n", name[i]);
}
}
Keep alike words together...
For lists of words, it is often more useful to group the "same" words together (even though they differ in case). For example:
If you want words arranged like the first column, I present three ways of doing so:
strcasecmp()
combined withstrcmp()
.isalpha()
,tolower()
, andisupper()
.In the end I discuss two alternatives:
Using available library functions
If it is possible to do so, avoid reinventing the wheel. In this case, we can do so by using the POSIX function
strcasecmp()
to see if they are equal with a case-insensitive comparison, and falling back onstrcmp()
when they are.(On some systems, the case-insensitive comparison function is called
stricmp()
or_stricmp()
. If one is not available to you, an implementation is provided below.)Avoiding two passes over the strings
Sometimes, existing functions do not perform well enough, and you have to do something else to make things faster. The following function does the comparison in roughly the same way in a single pass, and without using either
strcasecmp()
orstrcmp()
. But, it treats all non-alphabetical characters as being less than letters.Using this comparison for sorting will keep
milk
andMilk
next to each other even if the list includesmilk-duds
.Using a collating table
Here is a way to dynamically create a collating table from a "configuration". It serves to illustrate a contrastive technique to change how strings get compared.
You can map how the letters of the alphabet are compared with a kind of simple table that describes the relative order you want letters (or any character except the NUL byte) to have:
From this ordering, we can create a look-up table to see how two letters are supposed to compare to each other. The following function initializes the table if it has not already been done first, and otherwise performs the table look-up.
With this look-up table, we can now simplify the loop body of the
alphaBetize()
comparison function:Can we make things simpler?
Using the collating table, you can create many different orderings with a simplified comparison function, like:
Using this same function and through modifying the
alphaBetical
string, you can achieve nearly any ordering you want (alphabetical, reverse alphabetical, vowels before consonants, etc.). However, the arrangement of keeping alike words together requires interspersing capitalized words with words in lowercase, and this can only be done by doing a comparison that ignores case.Note that with the
simple_collating()
function above and thealphaBetical
string I provided,Bacon
will come beforemilk
, butMars
will go aftermilk
and beforeMilk
.If you want to sort based on your locale.
If you want to use a collating sequence that is already defined for your locale, you can set the locale and call the collating comparison function:
Now, by changing the locale, the sorting order will be based on a standardized collating sequence.
I'm late to this discussion, and have no particular expectation to swan in and take the fabulous prize, but not seeing a solution using the idioms I'd look to first, thought I'd chime in.
My first thought in reading the problem spec was some form of custom collating sequence, which I basically found in @jxh's Using a collating table notion. I don't see case insensitivity as a core concept, just the oddball ordering.
So, I offer the following code purely as an alternative implementation. It's specific to glibc - qsort_r(3) is used - but feels like a lighter-weight approach, and supports many collating sequences at run-time. But it's lightly tested, and I'm very likely missing various weaknesses. Among which: I've paid no particular attention to Unicode or the world of wide characters in general, and the casts to unsigned char to avoid negative array subscripts feel suspect.
The previous is close to code that could be put in a separate module or library, but lacks its own header file (or entry in a header file). My own test merely concatenates the code above and below into a file named custom_collate_sort.c, and uses
...to compile it.
You can write a custom comparison function for sort.
First, look at the default strcmp sort order:
strcmp
sorts by ASCII character code; i.e., it sortsA-Z
thena-z
so all capital A-Z come before any word with a lowercase letter:We can write our own comparison function used in
cmp
used inqsort
that ignores case. That looks like this:Be sure to also change
cmp
to:The case ignoring version now prints:
This is the same output you would get with the POSIX function strcasecmp.
The function
mycmp
first compares lexicographically in normal order[a|A]-[z|Z]
. This mean you will get like letter words together but you may getbacon, Bacon
as likely asBacon, bacon
. This is because qsort is not a stable sort and 'Bacon' compares equal to 'bacon'.Now what we want is if the comparison is 0 while ignoring case (i.e., same word like 'MILK' and 'milk) now compare including case and reverse the order:
Final version prints:
Unfortunately, this approach becomes unwieldy for UNICODE. For complex sorts, consider using a mapping or a multistep sort with a stable sort.
For complex and location aware alphabetical collations, consider Unicode collations. As an example, in different locations, letters alphabetize differently:
The default values for these distinctions are captured in in the Default Unicode Collation Element Table (DUCET) that provides a default mapping for UNICODE collations and string comparisons. You can modify the defaults to capture the distinction between dictionary sorting and phonebook sorting, different locations or different treatment for case. Individual location variations are actively tracked in the Unicode Common Locale Data Repository (CLDR).
The reccomendation for multi level sorting is tiered:
A widely used implementation of Unicode collations is in the ICU Library. The default DUCET collation for several examples would be:
You can explore the ICU library and change the locations and targets with the ICU Explorer
If you wanted to implement your own version of the DUCET for giggles, you can follow the general method used in this Python script. It is not overwhelming, but not trivial.
Here, if I got it right, you want something as I'd describe as follows:
A case insensitive sort, where under tie, tiebreaker condition "lowercase comes first" is to be used.
So it's like:
earlier_letter_in_the_alphabet < later_letter_in_the_alphabet
ignoring the caselowercase < uppercase
shorter_word < wider_word
'\0'
as the lowest possible in comparisonsStep 2 to be taken only if 1 didn't distinguish anything. Step 3 will already be checked with 1. All these are to be done letter-by-letter, meaning that you should switch to 2 as soon as you get a tie between corresponding characters, not just when the whole strings are on tie.
Assuming that this was right, all we need to do now is to write a function that makes this comparison for us for any given two strings.
A compare function, by convention/rule, should return a negative value for favouring the first parameter to be in front, negative value for favouring the second parameter, zero if it cannot distinguish them. Just an additional information which you likely already know by the way you make use of
strcmp
.And that's it! Replacing that
strcmp
in your code withmy_string_compare
here, also putting up these definitions we've made on top should provide a correct result. Indeed it provides the expected result for the example input in question.One could shorten the definitions of course, I have made them long so that it will be easier to understand what's going on. For example, I could boil it all down to the following:
Does essentially the same with the other one, you may use whichever you like; or even better, write one.
Standard Header Files Required by Program:
The main Program starts here:
Custom Sorting Table as Required:
Quick Sorting Algorithm, You can also use the Standard Library Provided:
Two of the Most Important Functions are:
The key of the OP code is the use of function
strcmp()
to compare two strings.So, I will start by replacing this standard function by another one, like the following:
The last lines can be compacted in this way:
Now, by replacing
strcmp()
bymy_strcmp()
you will have the desired result.In an sort algorithm it's good idea to think separately the 3 main aspects of it:
These aspects can be optimized independently.
Thus, for exampmle, once you have the comparisson function well settled, the next optimization step could be to replace the double for sorting algorithm by a more efficient one, like quicksort.
In particular, the function
qsort()
of the standard library<stdlib.h>
provides you with such an algorithm, so you don't need to care about programming it.Finally, the strategy you use to store the array information could have consequences in performance.
It would be more efficient to store strings like "array of pointers to char" instead of "array of array of char", since swapping pointers is faster than swapping two entire arrays of chars.
Arrays of pointers
ADDITIONAL NOTE: The three first
if()
's are actually redundant, because the logic of the following sentences implies the desired result in the case that*p1
or*p2
is 0. However, by keeping thoseif()
's, the code becomes more readable.