This question already has an answer here:
-
What is lexicographical order?
1 answer
I am currently reading about the std::next_permutation function and encountered the term "lexicographical order". At the particular time, I had no experience with this term whatsoever so did a google search for this and found only somewhat cryptic definitions of this type of order, including the wiki article (at least they are to me).
So could anybody try to help me understand this? What is a "good" definition of this term to you?
Regarding the wiki article - they claim that lexicographical order is also known as alphabetical order but as I continue reading, I get the understanding that they are not the same. Thus, the ongoing comparison confuses me a bit.
In normal English usage, when we sort words alphabetically, we employ two rules:
If two words have the same first letter, we compare the second. If the second letters are the same, we compare the third, etc. Finally, one word comes before the other if the first differing letter comes before the corresponding letter.
If two words are identical up to the length of the shorter word, the shorter word comes first.
So "Tom" comes before "Tooth". The first letters are identical ("T"), the second letters are identical "o", but the third letters diff and "m" comes before "o". Therefore "Tom" comes before "Tooth".
"Tom" comes before "Tomas" because the two words are identical through the first three letters "Tom" and "Tom" is shorter than "Tomas".
Lexicographic order is simply alphabetic ordering, generalized for non-letter values. Consider a sequence of values, not necessarily letters:
(1,5,10) comes before (1,6,3) because "5" comes before "6".
(1,5,10) comes before (1,5,10,15,20) because (1,5,10) is shorter than (1,5,10,15,20).
Lexicographic ordering is particularly useful if the elements of the sequence have some specific meaning, with the earlier values giving a higher precedence. For example, consider these times: 9:13 AM and 8:25 AM. If we represent these with the sequence (9,13) and (8,25), then (8,25) comes before (9,13) because 8 comes before 9. What if the hours are the same? For example, (9,13) comes before (9,45) because 13 comes before 45. As you can see, lexicographic ordering allows the hour field to have a higher precedence than the minute field.
Most of the out of the box string sorting algorithms are implemented as lexicographic sort. (More Details at the Bottom)
Example 1:
Random Elements:
['A','a','a','B','b','C','c','d','E']
Sorted with lexicographic ordering:
['A','B','C','E','a','a','b','c','d']
Example 2:
Random Elements with different length:
['a', 'b', 'aa', 'c', 'ddd', 'f']
Sorted with lexicographic ordering:
['a', 'aa', 'b', 'c', 'ddd', 'f']
Difference between lexicographic and natural sort
input = ["z1.txt", "z10.txt", "z3.txt", "z100.txt", "z101.txt"]
lexicogrpahic : ['z1.txt', 'z10.txt', 'z100.txt', 'z101.txt', 'z3.txt']
natural: ['z1.txt', 'z3.txt', 'z10.txt', 'z100.txt', 'z101.txt']
We could go into details here, but a lot of great people already created great explanations for that:
1) Does Python have a built in function for string natural sort?
2) https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/
In layman's terms, this means alphabetic order. In practice, you'd be sorting the strings character by character, according to their underlying numerical (usually ASCII) representation.