I have a field in a MySQL database that contains item descriptions of purchased products. Some of these are descriptions in plain English, others are part numbers, and others still are part numbers followed by a description. I have removed all spaces and dashes from the strings with a replace().
data looks like this:
1938420985390asdfih
1234812934810dflkasd
asdfasldkjfaasdfjasd
asd;flkjaklsdf
adfsdf1234073927357sdapjfas
1/4sdikhsd
and I would like to return:
1938420985390
1234812934810
(null)
(null)
1234073927357
(null)
What I really need is to write a SQL that will return the 13 digit part numbers, but not the extra letters/characters. I would prefer that it return the actual number, too, rather than a 1 or 0 for match/ no match.
I tried using a REGEXP function (someone suggested regexp ('\d{13}')
or regexp ('\p{13}')
but these didn't work. [These returned a 0 or 1, and not the part of the string that matched.] Any suggestions?
Thanks!
This is a non-trivial task in MySQL, there's no builtin function for returning a regular expression match. But because you are looking for exactly 13 digits, you could do something like this (obviously extend this to the number of positions you need to check...
-- setup test
CREATE TABLE t (foo VARCHAR(30));
INSERT INTO t VALUES
('1938420985390asdfih')
,('1234812934810dflkasd')
,('asdfasldkjfaasdfjasd')
,('asd;flkjaklsdf')
,('adfsdf1234073927357sdapjfas')
,('1/4sdikhsd')
SELECT CASE
WHEN SUBSTR(foo,1,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,1,13)
WHEN SUBSTR(foo,2,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,2,13)
WHEN SUBSTR(foo,3,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,3,13)
WHEN SUBSTR(foo,4,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,4,13)
WHEN SUBSTR(foo,5,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,5,13)
WHEN SUBSTR(foo,6,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,6,13)
WHEN SUBSTR(foo,7,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,7,13)
WHEN SUBSTR(foo,8,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,8,13)
WHEN SUBSTR(foo,9,13) REGEXP '^[0-9]{13}$' THEN SUBSTR(foo,9,13)
END AS digits
FROM t
-------------------
1938420985390
1234812934810
(NULL)
(NULL)
1234073927357
(NULL)
No, it's not pretty. But you should be able to extend this to effectively "scan" a string of reasonable length.
NOTE: The regular expression is checking that the whole 13 character substring consists of exactly 13 characters, each of the characters is a decimal digit (0 thru 9).
Getting back your matched values may not yet be supported in MySQL as described here - MySQL Regular Expressions with The REGEXP Operator. However as mentioned in the link, there are 3rd party libs that you could use like this one: UDF Repository for MySQL, which allows you to capture matches using PREG_CAPTURE
.
For some more information, this StackOverflow link seems to have dealt with this issue.
If each entry contains only one instance of numerical code then this one works:
SELECT CASE WHEN LENGTH(firstNumber(foo)) > 3 THEN firstNumber(foo) ELSE '' END AS result
FROM t
Two points worth to mention:
- length of digits must be at least some length, let's say 3 digits, so that we can avoid results like 1 from row 6 '1/4sdikhsd';
The function firstNumber is modified to return text but it is virtually the same:
DELIMITER //
CREATE FUNCTION firstNumber(s TEXT)
RETURNS TEXT
COMMENT 'Returns the first integer found in a string'
DETERMINISTIC
BEGIN
DECLARE token TEXT DEFAULT '';
DECLARE len INTEGER DEFAULT 0;
DECLARE ind INTEGER DEFAULT 0;
DECLARE thisChar CHAR(1) DEFAULT ' ';
SET len = CHAR_LENGTH(s);
SET ind = 1;
WHILE ind <= len DO
SET thisChar = SUBSTRING(s, ind, 1);
IF (ORD(thisChar) >= 48 AND ORD(thisChar) <= 57) THEN
SET token = CONCAT(token, thisChar);
ELSEIF token <> '' THEN
SET ind = len + 1;
END IF;
SET ind = ind + 1;
END WHILE;
IF token = '' THEN
RETURN '';
END IF;
RETURN token;
END // DELIMITER ;
The function you are looking for is REGEXP_SUBSTR():
SELECT REGEXP_SUBSTR(`dirty_value`,'[0-9]+') AS `clean_value` FROM `the_table`;
Note: I test it and works; I'm using MySQL Server v8.0 (not sure it worked in previous versions).
Good luck!
Instead of mysql
you can do it easily with grep
command in linux
grep [0-9] foo.txt
then create table and load it into mysql.