How to compare version string (“x.y.z”) in MySQL?

2020-02-10 03:42发布

问题:

I have firmware version strings into my table (like "4.2.2" or "4.2.16")

How can I compare, select or sort them ?

I cannot use standard strings comparison : "4.2.2" is a seen by SQL greater than "4.2.16"

As version strings, I would like 4.2.16 to be greater than 4.2.2

I would like to consider that firmware version can have chars in them : 4.24a1, 4.25b3 ... for this, usually, the subfield with chars has a fixed length.

how to proceed ?

回答1:

If all your version numbers look like any of these:

X
X.X
X.X.X
X.X.X.X

where X is an integer from 0 to 255 (inclusive), then you could use the INET_ATON() function to transform the strings into integers fit for comparison.

Before you apply the function, though, you'll need to make sure the function's argument is of the X.X.X.X form by appending the necessary quantity of '.0' to it. To do that, you will first need to find out how many .'s the string already contains, which can be done like this:

CHAR_LENGTH(ver) - CHAR_LENGTH(REPLACE(ver, '.', '')

That is, the number of periods in the string is the length of the string minus its length after removing the periods.

The obtained result should then be subtracted from 3 and, along with '.0', passed to the REPEAT() function:

REPEAT('.0', 3 - CHAR_LENGTH(ver) + CHAR_LENGTH(REPLACE(ver, '.', ''))

This will give us the substring that must be appended to the original ver value, to conform with the X.X.X.X format. So, it will, in its turn, be passed to the CONCAT() function along with ver. And the result of that CONCAT() can now be directly passed to INET_ATON(). So here's what we get eventually:

INET_ATON(
  CONCAT(
    ver,
    REPEAT(
      '.0',
      3 - CHAR_LENGTH(ver) + CHAR_LENGTH(REPLACE(ver, '.', ''))
    )
  )
)

And this is only for one value! :) A similar expression should be constructed for the other string, afterwards you can compare the results.

References:

  • INET_ATON()

  • CHAR_LENGTH()

  • CONCAT()

  • REPEAT()

  • REPLACE()



回答2:

Assuming that the number of groups is 3 or less, you can treat the version number as two decimal numbers and sort it accordingly. Here is how:

SELECT 
ver,
CAST(
    SUBSTRING_INDEX(ver, '.', 2)
    AS DECIMAL(6,3)
) AS ver1, -- ver1 = the string before 2nd dot
CAST(
    CASE
        WHEN LOCATE('.', ver) = 0 THEN NULL
        WHEN LOCATE('.', ver, LOCATE('.', ver)+1) = 0 THEN SUBSTRING_INDEX(ver, '.', -1)
        ELSE SUBSTRING_INDEX(ver, '.', -2)
    END
    AS DECIMAL(6,3)
) AS ver2  -- ver2 = if there is no dot then 0.0
           --        else if there is no 2nd dot then the string after 1st dot
           --        else the string after 1st dot
FROM
(
SELECT '1' AS ver UNION
SELECT '1.1' UNION
SELECT '1.01' UNION
SELECT '1.01.03' UNION
SELECT '1.01.04' UNION
SELECT '1.01.1' UNION
SELECT '1.11' UNION
SELECT '1.2' UNION
SELECT '1.2.0' UNION
SELECT '1.2.1' UNION
SELECT '1.2.11' UNION
SELECT '1.2.2' UNION
SELECT '2.0' UNION
SELECT '2.0.1' UNION
SELECT '11.1.1' 
) AS sample
ORDER BY ver1, ver2

Output:

ver     ver1    ver2
======= ======  ======
1        1.000  (NULL)
1.01     1.010   1.000
1.01.03  1.010   1.030
1.01.04  1.010   1.040
1.01.1   1.010   1.100
1.1      1.100   1.000
1.11     1.110  11.000
1.2.0    1.200   2.000
1.2      1.200   2.000
1.2.1    1.200   2.100
1.2.11   1.200   2.110
1.2.2    1.200   2.200
2.0      2.000   0.000
2.0.1    2.000   0.100
11.1.1  11.100   1.100

Notes:

  1. You can extend this example for max 4 groups or more but the string functions will get more and more complicated.
  2. The datatype conversion DECIMAL(6,3) is used for illustration. If you expect more than 3 digits in minor version numbers then modify accordingly.


回答3:

Finally, I found another way to sort version strings.

I just justify the string before storing into de database in a way it is sortable. As I am using the python Django framework, I just have created a VersionField that 'encode' the version string while storing and 'decode' it while reading, so that it is totally transparent for the application :

Here my code :

The justify function :

def vjust(str,level=5,delim='.',bitsize=6,fillchar=' '):
    """
    1.12 becomes : 1.    12
    1.1  becomes : 1.     1
    """
    nb = str.count(delim)
    if nb < level:
        str += (level-nb) * delim
    return delim.join([ v.rjust(bitsize,fillchar) for v in str.split(delim)[:level+1] ])

The django VersionField :

class VersionField(models.CharField) :

    description = 'Field to store version strings ("a.b.c.d") in a way it is sortable'

    __metaclass__ = models.SubfieldBase

    def get_prep_value(self, value):
        return vjust(value,fillchar=' ')

    def to_python(self, value):
        return re.sub('\.+$','',value.replace(' ',''))


回答4:

This is rather a complicated one, as SQL isn't designed to split out multiple values from a single field - this is a violation of First Normal Form. Assuming that you are not going to have more than three groups of numbers, each of which will not be more than three digits long, try:

cast(substring_index(concat(X,'.0.0.'), '.', 1) as float) * 1000000 +
cast(substring_index(substring_index(concat(X,'.0.0.'), '.', 2), '.', -1) as float) * 1000 +
cast(substring_index(substring_index(concat(X,'.0.0.'), '.', 3), '.', -1) as float)


回答5:

Python can compare lists element-by-element in exactly the way you want the versions to be compared, so you can simply split on the ".", call int(x) on each element (with a list comprehension) to convert the string to an int, and then compare

    >>> v1_3 = [ int(x) for x in "1.3".split(".") ]
    >>> v1_2 = [ int(x) for x in "1.2".split(".") ]
    >>> v1_12 = [ int(x) for x in "1.12".split(".") ]
    >>> v1_3_0 = [ int(x) for x in "1.3.0".split(".") ]
    >>> v1_3_1 = [ int(x) for x in "1.3.1".split(".") ]
    >>> v1_3
    [1, 3]
    >>> v1_2
    [1, 2]
    >>> v1_12
    [1, 12]
    >>> v1_3_0
    [1, 3, 0]
    >>> v1_3_1
    [1, 3, 1]
    >>> v1_2 < v1_3
    True
    >>> v1_12 > v1_3
    True
    >>> v1_12 > v1_3_0
    True
    >>> v1_12 > v1_3_1
    True
    >>> v1_3_1 < v1_3
    False
    >>> v1_3_1 < v1_3_0
    False
    >>> v1_3_1 > v1_3_0
    True
    >>> v1_3_1 > v1_12
    False
    >>> v1_3_1 < v1_12
    True
    >>> 


回答6:

Lots of good solutions here, but I wanted a stored function that would work with ORDER BY

CREATE FUNCTION standardize_version(version VARCHAR(255)) RETURNS varchar(255) CHARSET latin1 DETERMINISTIC NO SQL
BEGIN
  DECLARE tail VARCHAR(255) DEFAULT version;
  DECLARE head, ret VARCHAR(255) DEFAULT NULL;

  WHILE tail IS NOT NULL DO 
    SET head = SUBSTRING_INDEX(tail, '.', 1);
    SET tail = NULLIF(SUBSTRING(tail, LOCATE('.', tail) + 1), tail);
    SET ret = CONCAT_WS('.', ret, CONCAT(REPEAT('0', 3 - LENGTH(CAST(head AS UNSIGNED))), head));
  END WHILE;

  RETURN ret;
END|

to test:

SELECT standardize_version(version) FROM (SELECT '1.2.33.444.5b' AS version UNION SELECT '1' UNION SELECT NULL) AS t;

renders:

00001.00002.00033.00444.00005b
00001
(null)

And allows for comparisons of almost any set of versions, even ones with letters.



回答7:

/**
function version_compare(version1, version2)

parameters
version1 first version number.
version2 second version number.

return values
-1: if version1 is less than version2;
1: if version1 is greater than version2,
0: if version1 equal version2.

example:
select version_compare('4.2.2','4.2.16') from dual;
version_compare('4.2.2','4.2.16')  
-----------------------------------
    -1 

*/
drop function if exists version_compare;
delimiter @@

create function version_compare(version1 varchar(100), version2 varchar(100))
  returns tinyint
  begin
    declare v_result tinyint;
    declare version1_sub_string varchar(100);
    declare version2_sub_string varchar(100);
    declare version1_sub_int int;
    declare version2_sub_int int;

    declare version1_sub_end tinyint;
    declare version2_sub_end tinyint;


    if version1 = version2 then
      set v_result = 0;
    else

      set version1_sub_string = version1;
      set version2_sub_string = version2;

      lp1 : loop
        set version1_sub_end = locate('.', version1_sub_string);
        set version2_sub_end = locate('.', version2_sub_string);

        if version1_sub_end <> 0 then
          set version1_sub_int = cast(substring(version1_sub_string, 1, version1_sub_end - 1) as signed);
          set version1_sub_string = substring(version1_sub_string, version1_sub_end +1 );
        else
          set version1_sub_int = cast(version1_sub_string as signed);
        end if;

        if version2_sub_end <> 0 then
          set version2_sub_int = cast(substring(version2_sub_string, 1, version2_sub_end - 1) as signed);

          set version2_sub_string = substring(version2_sub_string, version2_sub_end + 1);
        else
          set version2_sub_int = cast(version2_sub_string as signed);
        end if;


        if version1_sub_int > version2_sub_int then
          set v_result = 1;
          leave lp1;

        elseif version1_sub_int < version2_sub_int then
            set v_result = -1;
            leave lp1;
        else
          if version1_sub_end = 0 and version2_sub_end = 0 then
            set v_result = 0;
            leave lp1;

          elseif version1_sub_end = 0 then
              set v_result = -1;
              leave lp1;

          elseif version2_sub_end = 0 then
              set v_result = 1;
              leave lp1;
          end if;      
        end if;

      end loop;
    end if;

    return v_result;

 end@@
delimiter ;


回答8:

I was searching for the same thing and instead ended up doing this -- but staying in mysql :

  • Installing this udf library into mysql because I wanted the power of PCRE.
  • using this statement

    case when version is null then null
    when '' then 0
    else
    preg_replace( '/[^.]*([^.]{10})[.]+/', '$1', 
        preg_replace('/([^".,\\/_ ()-]+)([".,\\/_ ()-]*)/','000000000$1.',
            preg_replace('/(?<=[0-9])([^".,\\/_ ()0-9-]+)/','.!$1',version
    ))) 
    end
    

I'll break down what that means:

  • preg_replace is a function that the UDF library created. Because it's a UDF you can just call it from any user or dbspace like that
  • ^".,\\/_ () right now i'm considering all of these characters as separators or traditional "dots" in a version
  • preg_replace('/(?<=[0-9])([^".,\\/_ ()0-9-]+)/','.!$1',version) means to replace all the non-"dots" and non-numbers that are preceded by a number to be preceded by a "dot" and an exclamation point.
  • preg_replace('/([^".,\\/_ ()-]+)([".,\\/_ ()-]*)/','000000000$1.', ...) means to additionally replace all the "dots" with actual dots and to pad all the numbers with 9 zero's. Also any adjacent dots would be reduced to 1.
  • preg_replace( '/0*([^.]{10})[.]+/', '$1', ... ) means to additionally strip all the number blocks down to only 10 digits long and to preserve as many blocks as needed. I wanted to force 6 blocks to keep it under 64-bytes but needing 7 blocks was surprisingly common and thus necessary for my accuracy. Also needed blocks of 10 so 7 blocks of 9 was not an option. But the variable length is working well for me. -- remember strings are compared left to right

So now I can handle versions like:

1.2 < 1.10
1.2b < 1.2.0
1.2a < 1.2b
1.2 = 1.2.0
1.020 = 1.20
11.1.1.3.0.100806.0408.000  < 11.1.1.3.0.100806.0408.001
5.03.2600.2180 (xpsp_sp2_rtm.040803-2158)
A.B.C.D = a.B.C.D
A.A  <  A.B

I chose exclamation point because it sorts in the collations sequences (that I'm using anyway) before 0. It's relative sort to 0 allows letters like b and a when used immediately adjacent to a number above to be treated like a new section and be sort before 0 -- which is the padding I am using.

I am using 0 as padding so that vendor's mistakes like moving from a fixed 3 digit block to a variable one don't bite me.

You can easily choose more padding if you want to handle silly versions like "2.11.0 Under development (unstable) (2010-03-09)" -- the string development is 11 bytes.

You can easily request more blocks in the final replace.

I could have done more but I was trying to do a as few paces as possible with a high-level of accuracy since I have several millions records to scan regularly. If anyone sees an optimization please repsond.

I chose to keep it as a string and not cast into a number because the cast has a cost and also letters are important as we saw. One thing i was thinking about is doing a test on the string and returning an option that isn't as many passes or less expensive function for tidier cases. like 11.1.1.3 is a very common format



回答9:

This is my solution. It not depends on number of subversion.

For example:

select SF_OS_VERSION_COMPARE('2016.10.1712.58','2016.9.1712.58');

returns 'HIGH'

select SF_OS_VERSION_COMPARE('2016.10.1712.58','2016.10.1712.58');

returns 'EQUAL'

delimiter //

DROP FUNCTION IF EXISTS SF_OS_VERSION_COMPARE //

CREATE FUNCTION SF_OS_VERSION_COMPARE(ver_1 VARCHAR(50), ver_2 VARCHAR(50)) RETURNS VARCHAR(5)
    DETERMINISTIC
    COMMENT 'Return "HIGH", "LOW" OR "EQUAL" comparing VER_1 with VER_2'
BEGIN
    DECLARE v_ver1 VARCHAR(50);
    DECLARE v_ver2 VARCHAR(50);
    DECLARE v_ver1_num INT;
    DECLARE v_ver2_num INT;

    SET v_ver1 = ver_1;
    SET v_ver2 = ver_2;

    WHILE ( v_ver1 <> v_ver2 AND ( v_ver1 IS NOT NULL OR v_ver2 IS NOT NULL )) DO

    SET v_ver1_num = CAST(SUBSTRING_INDEX(v_ver1, '.', 1) AS UNSIGNED INTEGER);
    SET v_ver2_num = CAST(SUBSTRING_INDEX(v_ver2, '.', 1) AS UNSIGNED INTEGER);

    IF ( v_ver1_num > v_ver2_num )
    THEN
        return 'HIGH';
    ELSEIF ( v_ver1_num < v_ver2_num )
    THEN
        RETURN 'LOW';
    ELSE
        SET v_ver1 = SUBSTRING(v_ver1,LOCATE('.', v_ver1)+1);
        SET v_ver2 = SUBSTRING(v_ver2,LOCATE('.', v_ver2)+1);
    END IF;

    END WHILE;

    RETURN 'EQUAL';

END //