MySQL Field Data Parsing

2020-04-20 23:45发布

问题:

Started here, I was asked to create a new post in order to flesh it out which I'll try to do here but I'm afraid it's rather wordy. I created a test table with what seems to be a good cross-section of the data that needs to be parsed. Using this query from here to which I added some additional fields that are needed, this is the query now.

SELECT ID, PartNo,
  GROUP_CONCAT(
    REPLACE(t.part, '-', CONCAT(' ', LEFT(t.part, 2)))
    ORDER BY t.ModelList
    SEPARATOR ' '                                      
  ) Models
FROM (
  SELECT t.ID, t.PartNo, t.Models, p.ModelList,
    REPLACE(REPLACE(
      SUBSTRING_INDEX(t.Models, ';', p.ModelList),
      SUBSTRING_INDEX(t.Models, ';', p.ModelList - 1),
      ''
    ), ';', '') part 
  FROM parts_listing t CROSS JOIN (
    SELECT 1 ModelList UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
    SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL
    SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
  ) p 
  WHERE REPLACE(REPLACE(Models, '-', ''), ';', '') REGEXP '^[0-9]*$'
) t
WHERE t.part <> ''
GROUP BY t.ID, t.PartNo, t.Models
ORDER BY t.PartNo, t.Models

For the most basic parts, the Models data looks like this:

2206-13-26-33; 2302-06-13-32-33

. . . and this is the part that I am working on now. The Models column also contains ALL 23RD etc that will also need to be parsed somehow but for now I am concerned only with the data as shown above and I need to parse it to something like:

2206 2213 2226 2233 2302 2306 2313 2332 2333

The query above, however, seems to be repeating the first bit so it comes out as:

2206 2213 2226 2233 2202 2206 2213 2232 2233

It gives the result in one chunk but it would also be okay if each result was on it's own line as ultimately it will be dumped into another table and that is the way it will be by then.

It also gives too few results. The actual table contains over 15,000 rows but this is fetching only a small fraction and in the data sample below, it should fetch about 20 rows but actually fetches only 6 although the fiddle, which has a very limited data set, does indeed give the proper results. Running the the fiddle query against my own data fetches only 555 rows.

A cross-section sample of data can be created with this:

DROP TABLE IF EXISTS `parts_listing`;
CREATE TABLE IF NOT EXISTS `parts_listing` (
  `ID` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `GroupNumber` varchar(20) DEFAULT NULL,
  `PartNo` varchar(30) DEFAULT NULL,
  `Models` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=34 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;

INSERT INTO `parts_listing` (`ID`, `GroupNumber`, `PartNo`, `Models`) VALUES
    (1, '0.000', '391906', '2201-02-11-20-22-32-40; 2301'),
    (2, '0.000', '391907', '2206-26-33'),
    (3, '0.000', '399391', '2213'),
    (4, '0.000', '415209', '2302-06-32-33'),
    (5, '0.000', '415245', '2313'),
    (6, '0.000', '437273', 'ALL 24TH; 25TH; 26TH'),
    (7, '0.000', '436803', '2631 (BODY 2678); 5431 (BODY 5478)'),
    (8, '0.000', '448611', 'ALL 54TH'),
    (9, '0.0015', '372174', 'ALL 22ND; 23RD'),
    (10, '0.0015', '399324', '2201-02-11-20-22-32-40; 2301'),
    (11, '0.0015', '399324', '2206-13-26-33; 2302-06-13-32-33'),
    (12, '0.0015', '372146', 'ALL 22ND; 23RD'),
    (13, '0.0015', 'G179359', '2201-02-11-20-22-32-40; 2301'),
    (14, '0.0015', 'G179559', '2206-13-26-33; 2302-06-13-32-33'),
    (15, '0.0015', '422024', '2201-02-11-20-22-32-40; 2301'),
    (16, '0.0015', '422024', '2206-13-26-33; 2302-06-13-32-33'),
    (17, '0.0016', '378664', 'ALL 22ND; 23RD'),
    (18, '0.0016', 'G178532', 'ALL 22ND; 23RD'),
    (19, '0.0016', 'G123197', 'ALL 22ND; 23RD'),
    (20, '30.793', '448421', 'ALL 54TH'),
    (21, '30.7932', '422762', '2462-65-92-95-98; 2562-65-92-95; 2662-65-92-95'),
    (22, '30.7932', '448902', '5462-65-67-82-85-92-95-97'),
    (23, '30.7938', '399941', 'ALL 22ND; 23RD'),
    (24, '30.7938', '425809', 'ALL 24TH; 25TH; 26TH'),
    (25, '30.7938', '447799', 'ALL 54TH'),
    (26, '30.79383', '399927', 'ALL 22ND; 23RD; 24TH; 25TH; 26TH'),
    (27, '30.79396', 'PA437550', '2562-65-77-79; 2677-79'),
    (28, '30.79398', '422470', '2452-72; 2552-72'),
    (29, '30.79398', '422471', '2452-72; 2552-72'),
    (30, '30.79398', '422553', 'ALL 24TH; 25TH'),
    (31, '30.8007', '425517', '2413-52-62-65-67-72-92-95-98; 2552-62-65-72-77-92-95; 2650-51-52-62-65-72-77-92-95-97; 5450-51-52-62-65-67-72-77-82-85-92-95-97'),
    (32, '30.8007', '425920', 'RHD, 2462-65-92-95; 2562-65-92-95; 2662-65-92-95-97; 5462-65-82-85-92-95-97'),
    (33, '30.8044', 'G129662', '2293');

回答1:

I think you need a more sophisticated way to verify that the Models column values ​​are correct due to spaces after semicolons. In addition, if you need to process the Models column values that contain more than ten parts, you will need a longer sequence of numbers. I use the INFORMATION_SCHEMA COLUMNS table for this. You can also create an auxiliary table for this.

SELECT
  CONCAT(
    IF(N.n = 0, '', LEFT(t.part, 2)),
    SUBSTRING_INDEX(
      SUBSTRING_INDEX(t.part, '-', N.n + 1), '-', -1
    )
  ) AS part, t.isHD, t.ID, t.GroupNumber, t.PartNo, t.Models
FROM Numbers AS N CROSS JOIN (
  SELECT
    SUBSTRING_INDEX(
      SUBSTRING_INDEX(t.part, ' ', N.n + 1), ' ', -1
    ) AS part, t.isHD, t.ID, t.GroupNumber, t.PartNo, t.Models
  FROM Numbers AS N CROSS JOIN (
    SELECT
      SUBSTRING_INDEX(
        SUBSTRING_INDEX(t.part, ';', N.n + 1), ';', -1
      ) AS part, t.isHD, t.ID, t.GroupNumber, t.PartNo, t.Models
    FROM Numbers AS N CROSS JOIN (
      SELECT
        REPLACE(
          SUBSTRING_INDEX(
            SUBSTRING_INDEX(t.part, 'LHD', N.n + 1), 'LHD', -1
          ), ',', ''
        ) AS part, IF(N.n > 0, 1, t.isHD) AS isHD,
        t.ID, t.GroupNumber, t.PartNo, t.Models
      FROM Numbers AS N CROSS JOIN (
        SELECT
          SUBSTRING_INDEX(
            SUBSTRING_INDEX(p.Models, 'RHD', N.n + 1), 'RHD', -1
          ) AS part, IF(N.n > 0, 2, 0) AS isHD,
          p.ID, p.GroupNumber, p.PartNo, p.Models
        FROM Numbers AS N CROSS JOIN parts_listing AS p
        WHERE p.Models REGEXP '[[:<:]]([LR]HD,? +)?[0-9]+(-[0-9]+)*[[:>:]]' AND
              N.n <= (LENGTH(p.Models) - LENGTH(REPLACE(p.Models, 'RHD', ''))) / 3
      ) AS t WHERE N.n <= (LENGTH(t.part) - LENGTH(REPLACE(t.part, 'LHD', ''))) / 3
    ) AS t WHERE N.n <= LENGTH(t.part) - LENGTH(REPLACE(t.part, ';', ''))
  ) AS t WHERE N.n <= LENGTH(t.part) - LENGTH(REPLACE(t.part, ' ', ''))
) AS t WHERE t.part REGEXP '^[0-9]+(-[0-9]+)*$' AND
             N.n <= LENGTH(t.part) - LENGTH(REPLACE(t.part, '-', ''));

Partial output (29 rows):

+------+------+----+-------------+---------+--------------------------------------+
| part | isHD | ID | GroupNumber | PartNo  |                Models                |
+------+------+----+-------------+---------+--------------------------------------+
| 2201 |    0 |  1 | 0.000       | 391906  | 2201-02-11-20-22-32-40; 2301         |
| 2202 |    0 |  1 | 0.000       | 391906  | 2201-02-11-20-22-32-40; 2301         |
| 2211 |    0 |  1 | 0.000       | 391906  | 2201-02-11-20-22-32-40; 2301         |
| 2220 |    0 |  1 | 0.000       | 391906  | 2201-02-11-20-22-32-40; 2301         |
| 2222 |    0 |  1 | 0.000       | 391906  | 2201-02-11-20-22-32-40; 2301         |
| 2232 |    0 |  1 | 0.000       | 391906  | 2201-02-11-20-22-32-40; 2301         |
| 2240 |    0 |  1 | 0.000       | 391906  | 2201-02-11-20-22-32-40; 2301         |
| 2301 |    0 |  1 | 0.000       | 391906  | 2201-02-11-20-22-32-40; 2301         |
| 2213 |    1 |  3 | 0.000       | 399391  | LHD 2213                             |
| 2313 |    2 |  5 | 0.000       | 415245  | RHD, 2313                            |
| 2631 |    0 |  7 | 0.000       | 436803  | 2631 (BODY 2678); 5431 (BODY 5478)   |
| 5431 |    0 |  7 | 0.000       | 436803  | 2631 (BODY 2678); 5431 (BODY 5478)   |
| 2206 |    1 | 14 | 0.0015      | G179559 | LHD, 2206-13-26-33; 2302-06-13-32-33 |
| 2213 |    1 | 14 | 0.0015      | G179559 | LHD, 2206-13-26-33; 2302-06-13-32-33 |
| 2226 |    1 | 14 | 0.0015      | G179559 | LHD, 2206-13-26-33; 2302-06-13-32-33 |
| 2233 |    1 | 14 | 0.0015      | G179559 | LHD, 2206-13-26-33; 2302-06-13-32-33 |
| 2302 |    1 | 14 | 0.0015      | G179559 | LHD, 2206-13-26-33; 2302-06-13-32-33 |
| 2306 |    1 | 14 | 0.0015      | G179559 | LHD, 2206-13-26-33; 2302-06-13-32-33 |
| 2313 |    1 | 14 | 0.0015      | G179559 | LHD, 2206-13-26-33; 2302-06-13-32-33 |
| 2332 |    1 | 14 | 0.0015      | G179559 | LHD, 2206-13-26-33; 2302-06-13-32-33 |
| 2333 |    1 | 14 | 0.0015      | G179559 | LHD, 2206-13-26-33; 2302-06-13-32-33 |
| 2201 |    0 | 15 | 0.0015      | 422024  | 2201-02-11-20-22-32-40; RHD, 2301    |
| 2202 |    0 | 15 | 0.0015      | 422024  | 2201-02-11-20-22-32-40; RHD, 2301    |
| 2211 |    0 | 15 | 0.0015      | 422024  | 2201-02-11-20-22-32-40; RHD, 2301    |
| 2220 |    0 | 15 | 0.0015      | 422024  | 2201-02-11-20-22-32-40; RHD, 2301    |
| 2222 |    0 | 15 | 0.0015      | 422024  | 2201-02-11-20-22-32-40; RHD, 2301    |
| 2232 |    0 | 15 | 0.0015      | 422024  | 2201-02-11-20-22-32-40; RHD, 2301    |
| 2240 |    0 | 15 | 0.0015      | 422024  | 2201-02-11-20-22-32-40; RHD, 2301    |
| 2301 |    2 | 15 | 0.0015      | 422024  | 2201-02-11-20-22-32-40; RHD, 2301    |
+------+------+----+-------------+---------+--------------------------------------+

Demo on DB Fiddle.



回答2:

You can try this. I changed the parts where You split the Models column. I split them into different rows, and let the code from forpas go over it and join them again together. rows with text in it aren't touched, because you didn't specify nothing.

select  Id,GroupNumber,PartNo,
  group_concat(
    replace(t.part, '-', concat(' ', left(t.part, 2)))
    order by t.partno, t.part ASC
    separator ' '                                      
  ) Models
from
(SELECT
  Id,GroupNumber,PartNo,
  TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(models, ';', n.digit+1), ';', -1)) part
FROM
  ( SElECT Id,GroupNumber,PartNo,
   REPLACE(REPLACE(REPLACE(Models, ')', ''), 'RHD,', ''), '(BODY', ';') Models
   FROM
   parts_listing ) p_l
  INNER JOIN
  (SELECT 0 digit UNION ALL select 1 union all select 2 union all select 3 union all
    select 4 union all select 5 union all select 6 union all
    select 7 union all select 8 union all select 9) n
  ON LENGTH(REPLACE(Models, ';' , '')) <= LENGTH(Models)-n.digit
  Where REPLACE(REPLACE(REPLACE(Models, '-', ''), ' ', ''), ';', '') REGEXP '^[0-9]+$'
ORDER BY
  Id,
  n.digit,part) t
   Group by  Id,GroupNumber,PartNo
   ;

Gets you

Id  GroupNumber     PartNo  Models
1   0.000   391906  2201 2202 2211 2220 2222 2232 2240 2301
2   0.000   391907  2206 2226 2233
3   0.000   399391  2213
4   0.000   415209  2302 2306 2332 2333
5   0.000   415245  2313
7   0.000   436803  2631 2678 5431 5478
10  0.0015  399324  2201 2202 2211 2220 2222 2232 2240 2301
11  0.0015  399324  2206 2213 2226 2233 2302 2306 2313 2332 2333
13  0.0015  G179359     2201 2202 2211 2220 2222 2232 2240 2301
14  0.0015  G179559     2206 2213 2226 2233 2302 2306 2313 2332 2333
15  0.0015  422024  2201 2202 2211 2220 2222 2232 2240 2301
16  0.0015  422024  2206 2213 2226 2233 2302 2306 2313 2332 2333
21  30.7932     422762  2462 2465 2492 2495 2498 2562 2565 2592 2595 2662 2665 2692 2695
22  30.7932     448902  5462 5465 5467 5482 5485 5492 5495 5497
27  30.79396    PA437550    2562 2565 2577 2579 2677 2679
28  30.79398    422470  2452 2472 2552 2572
29  30.79398    422471  2452 2472 2552 2572
31  30.8007     425517  2413 2452 2462 2465 2467 2472 2492 2495 2498 2552 2562 2565 2572 2577 2592 2595 2650 2651 2652 2662 2665 2672 2677 2692 2695 2697 5450 5451 5452 5462 5465 5467 5472 5477 5482 5485 5492 5495 5497
32  30.8007     425920  2462 2465 2492 2495 2562 2565 2592 2595 2662 2665 2692 2695 2697 5462 5465 5482 5485 5492 5495 5497
33  30.8044     G129662     2293

Dbfiddle example



标签: mysql parsing