Can you split/explode a field in a MySQL query?

2018-12-31 06:10发布

I have to create a report on some student completions. The students each belong to one client. Here are the tables (simplified for this question).

CREATE TABLE  `clients` (
  `clientId` int(10) unsigned NOT NULL auto_increment,
  `clientName` varchar(100) NOT NULL default '',
  `courseNames` varchar(255) NOT NULL default ''
)

The courseNames field holds a comma-delimited string of course names, eg "AB01,AB02,AB03"

CREATE TABLE  `clientenrols` (
  `clientEnrolId` int(10) unsigned NOT NULL auto_increment,
  `studentId` int(10) unsigned NOT NULL default '0',
  `courseId` tinyint(3) unsigned NOT NULL default '0'
)

The courseId field here is the index of the course name in the clients.courseNames field. So, if the client's courseNames are "AB01,AB02,AB03", and the courseId of the enrolment is 2, then the student is in AB03.

Is there a way that I can do a single select on these tables that includes the course name? Keep in mind that there will be students from different clients (and hence have different course names, not all of which are sequential,eg: "NW01,NW03")

Basically, if I could split that field and return a single element from the resulting array, that would be what I'm looking for. Here's what I mean in magical pseudocode:

SELECT e.`studentId`, SPLIT(",", c.`courseNames`)[e.`courseId`]
FROM ...

标签: mysql
16条回答
不流泪的眼
2楼-- · 2018-12-31 07:03

It is possible to explode a string in a MySQL SELECT statement.

Firstly generate a series of numbers up to the largest number of delimited values you wish to explode. Either from a table of integers, or by unioning numbers together. The following generates 100 rows giving the values 1 to 100. It can easily be expanded to give larger ranges (add another sub query giving the values 0 to 9 for hundreds - hence giving 0 to 999, etc).

SELECT 1 + units.i + tens.i * 10 AS aNum
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens

This can be cross joined against your table to give you the values. Note that you use SUBSTRING_INDEX to get the delimited value up to a certain value, and then use SUBSTRING_INDEX to get that value, excluding previous ones.

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
CROSS JOIN
(
    SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
    FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
    CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0

As you can see there is a slight issue here that the last delimited value is repeated many times. To get rid of this you need to limit the range of numbers based on how many delimiters there are. This can be done by taking the length of the delimited field and comparing it to the length of the delimited field with the delimiters changed to '' (to remove them). From this you can get the number of delimiters:-

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
FROM clients
INNER JOIN
(
    SELECT 1 + units.i + tens.i * 10 AS aNum
    FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
    CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum

In the original example field you could (for example) count the number of students on each course based on this. Note that I have changed the sub query that gets the range of numbers to bring back 2 numbers, 1 is used to determine the course name (as these are based on starting at 1) and the other gets the subscript (as they are based starting at 0).

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name, COUNT(clientenrols.studentId)
FROM clients
INNER JOIN
(
    SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
    FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
    CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
) sub0
ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
LEFT OUTER JOIN clientenrols
ON clientenrols.courseId = sub0.aSubscript
GROUP BY a_course_name

As you can see, it is possible but quite messy. And with little opportunity to use indexes it is not going to be efficient. Further the range must cope with the greatest number of delimited values, and works by excluding lots of duplicates; if the max number of delimited values is very large then this will slow things down dramatically. Overall it is generally far better to just properly normalise the database.

查看更多
呛了眼睛熬了心
3楼-- · 2018-12-31 07:05

Based on Alex answer above (https://stackoverflow.com/a/11022431/1466341) I came up with even better solution. Solution which doesn't contain exact one record ID.

Assuming that the comma separated list is in table data.list, and it contains listing of codes from other table classification.code, you can do something like:

SELECT 
    d.id, d.list, c.code
FROM 
    classification c
    JOIN data d
        ON d.list REGEXP CONCAT('[[:<:]]', c.code, '[[:>:]]');

So if you have tables and data like this:

CLASSIFICATION (code varchar(4) unique): ('A'), ('B'), ('C'), ('D')
MY_DATA (id int, list varchar(255)): (100, 'C,A,B'), (150, 'B,A,D'), (200,'B')

above SELECT will return

(100, 'C,A,B', 'A'),
(100, 'C,A,B', 'B'),
(100, 'C,A,B', 'C'),
(150, 'B,A,D', 'A'),
(150, 'B,A,D', 'B'),
(150, 'B,A,D', 'D'),
(200, 'B', 'B'),
查看更多
查无此人
4楼-- · 2018-12-31 07:07

Well, nothing I used worked, so I decided creating a real simple split function, hope it helps:

    DECLARE inipos INTEGER;
    DECLARE endpos INTEGER;
    DECLARE maxlen INTEGER;
    DECLARE item VARCHAR(100);
    DECLARE delim VARCHAR(1);

    SET delim = '|';
    SET inipos = 1;
    SET fullstr = CONCAT(fullstr, delim);
    SET maxlen = LENGTH(fullstr);

    REPEAT
        SET endpos = LOCATE(delim, fullstr, inipos);
        SET item =  SUBSTR(fullstr, inipos, endpos - inipos);

        IF item <> '' AND item IS NOT NULL THEN           
            USE_THE_ITEM_STRING;
        END IF;
        SET inipos = endpos + 1;
    UNTIL inipos >= maxlen END REPEAT;
查看更多
长期被迫恋爱
5楼-- · 2018-12-31 07:09

I just had a similar issue with a field like that which I solved a different way. My use case was needing to take those ids in a comma separated list for use in a join.

I was able to solve it using a like, but it was made easier because in addition to the comma delimiter the ids were also quoted like so:

keys "1","2","6","12"

Because of that, I was able to do a LIKE

SELECT twwf.id, jtwi.id joined_id FROM table_with_weird_field twwf INNER JOIN join_table_with_ids jtwi ON twwf.delimited_field LIKE CONCAT("%\"", jtwi.id, "\"%")

This basically just looks to see if the id from the table you're trying to join appears in the set and at that point you can join on it easily enough and return your records. You could also just create a view from something like this.

It worked well for my use case where I was dealing with a Wordpress plugin that managed relations in the way described. The quotes really help though because otherwise you run the risk of partial matches (aka - id 1 within 18, etc).

查看更多
登录 后发表回答