return numbers from the middle of a string with ir

2019-07-26 01:09发布

问题:

I have a field in a MySQL database that contains item descriptions of purchased products. Some of these are descriptions in plain English, others are part numbers, and others still are part numbers followed by a description. I have removed all spaces and dashes from the strings with a replace().

data looks like this:

1938420985390asdfih
1234812934810dflkasd
asdfasldkjfaasdfjasd
asd;flkjaklsdf
adfsdf1234073927357sdapjfas
1/4sdikhsd 

and I would like to return:

1938420985390
1234812934810
(null)
(null)
1234073927357
(null)

What I really need is to write a SQL that will return the 13 digit part numbers, but not the extra letters/characters. I would prefer that it return the actual number, too, rather than a 1 or 0 for match/ no match.

I tried using a REGEXP function (someone suggested regexp ('\d{13}') or regexp ('\p{13}') but these didn't work. [These returned a 0 or 1, and not the part of the string that matched.] Any suggestions?

Thanks!

回答1:

This is a non-trivial task in MySQL, there's no builtin function for returning a regular expression match. But because you are looking for exactly 13 digits, you could do something like this (obviously extend this to the number of positions you need to check...

-- setup test
CREATE TABLE t (foo VARCHAR(30));
INSERT INTO t VALUES 
('1938420985390asdfih')
,('1234812934810dflkasd')
,('asdfasldkjfaasdfjasd')
,('asd;flkjaklsdf')
,('adfsdf1234073927357sdapjfas')
,('1/4sdikhsd')


SELECT CASE
       WHEN SUBSTR(foo,1,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,1,13)
       WHEN SUBSTR(foo,2,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,2,13)
       WHEN SUBSTR(foo,3,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,3,13)
       WHEN SUBSTR(foo,4,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,4,13)
       WHEN SUBSTR(foo,5,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,5,13)
       WHEN SUBSTR(foo,6,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,6,13)
       WHEN SUBSTR(foo,7,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,7,13)
       WHEN SUBSTR(foo,8,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,8,13)
       WHEN SUBSTR(foo,9,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,9,13)
       END AS digits
  FROM t

-------------------
1938420985390
1234812934810
(NULL)
(NULL)
1234073927357
(NULL) 

No, it's not pretty. But you should be able to extend this to effectively "scan" a string of reasonable length.

NOTE: The regular expression is checking that the whole 13 character substring consists of exactly 13 characters, each of the characters is a decimal digit (0 thru 9).



回答2:

Getting back your matched values may not yet be supported in MySQL as described here - MySQL Regular Expressions with The REGEXP Operator. However as mentioned in the link, there are 3rd party libs that you could use like this one: UDF Repository for MySQL, which allows you to capture matches using PREG_CAPTURE.

For some more information, this StackOverflow link seems to have dealt with this issue.



回答3:

If each entry contains only one instance of numerical code then this one works:

SELECT CASE WHEN LENGTH(firstNumber(foo)) > 3 THEN firstNumber(foo) ELSE '' END AS result
FROM t

Two points worth to mention:

  1. length of digits must be at least some length, let's say 3 digits, so that we can avoid results like 1 from row 6 '1/4sdikhsd';
  2. The function firstNumber is modified to return text but it is virtually the same:

    DELIMITER //
    CREATE FUNCTION firstNumber(s TEXT)
        RETURNS TEXT
        COMMENT 'Returns the first integer found in a string'
    DETERMINISTIC
    BEGIN
    
    DECLARE token TEXT DEFAULT '';
    DECLARE len INTEGER DEFAULT 0;
    DECLARE ind INTEGER DEFAULT 0;
    DECLARE thisChar CHAR(1) DEFAULT ' ';
    
    SET len = CHAR_LENGTH(s);
    SET ind = 1;
    WHILE ind <= len DO
        SET thisChar = SUBSTRING(s, ind, 1);
        IF (ORD(thisChar) >= 48 AND ORD(thisChar) <= 57) THEN
            SET token = CONCAT(token, thisChar);
        ELSEIF token <> '' THEN
            SET ind = len + 1;
        END IF;
        SET ind = ind + 1;
    END WHILE;
    
    IF token = '' THEN
        RETURN '';
    END IF;
    
    RETURN token;
    
    END //    DELIMITER ;
    


回答4:

The function you are looking for is REGEXP_SUBSTR():

SELECT REGEXP_SUBSTR(`dirty_value`,'[0-9]+') AS `clean_value` FROM `the_table`;

Note: I test it and works; I'm using MySQL Server v8.0 (not sure it worked in previous versions).

Good luck!



回答5:

Instead of mysql you can do it easily with grep command in linux

grep [0-9] foo.txt

then create table and load it into mysql.