Get rows that include value of input in specific c

2019-09-14 20:36发布

问题:

Thanks to Paul Spiegel's answer on a previous question, I have a table tab in database dab and it consists of four columns. A simple, a ID column (id) that is also the primary key (int), a pat column (varchar(255)), a path column (varchar(191)), and finally has_tree (bit(1)).

id  pat     path    has_tree
1   AA      1       1
2   ABA     1/2     1
3   ABCA    1/3     1
4   ABA     1/5     0
5   AB      2       0
6   BBB     2/1     1
7   CAB     2/2     1
8   ABC     1/4     0
9   ABC     1/5/7   1
10  ABA     3/2     1

Given a pat value (if it exists), I want

  • all unique pat values (including the input's if it has_tree==1),
  • where their path contains the path of the input,
  • and where has_tree == 1.

So input AA should return ['AA', 'ABA', 'ABCA', 'ABC'].

Important remarks are that the path for the input pat is not necessarily a single item 'root path', in other words: it's very well possible that the input path is something like 1/12/45/966 in which case I want all the unique pat values where the path is a descendant of 1/12/45/966 and where has_tree is 1. Another remark is that the path can be very deep, so even if the input path is 1, the result can be many levels deeper, e.g. 1/22/36/88/98/455/878/1205/2555.

Paul suggested the following query, which worked for the proposed data structure, however as you can see the current structure and requirements are slightly different. (Also note that I have one database with multiple tables. So given a table tab in database dab I'd like to execute the script.)

SELECT t.*
FROM tree t
CROSS JOIN tree r -- root
WHERE r.CELL = '3B0'
  AND t.path LIKE CONCAT(r.path, '%');

I am trying to do this with PHP and then calling some SQL database. Is it possible to write this in a generic, performant form such that it works for PostgreSQL, MySQL(i) and others? If not, I'd at least like to see the MySQLi and PostgreSQL variants.

Building on Paul's example code above, I think it would be something along these lines, but I am not sure. I am also not sure why the JOIN operator is necessary and what it does in this context (I know what it generally does, but not why it is necessary in this case). I am sure this code can be improved, fixed, and made more general for all SQL flavours? Additionally, I do not see that my attempt below includes the current input pat in the output.

SELECT t.pat
FROM `tab` t
JOIN `tab` r
WHERE r.pat = 'AA' -- input
  AND t.path LIKE CONCAT(r.path, '/', '%')
  AND t.has_tree = 1;

Bonus question: I have indexed all columns except for has_tree. Would it be beneficial to index this column as well?

回答1:

First note: The paths should end with a backslash ('1/', '1/2/' instead of '1' and '1/2'). This is important because otherwise you can not match the the root of the subtree. Example:

WHERE path LIKE '1/%'

will not match '1' but will match '1/'.

You can also not just skip the backslash in the query:

WHERE path LIKE '1%'

will match the root '1' but also '10..'

Now to your question: If you already know the path of the root node, you don't need a join:

SELECT * FROM tab WHERE path LIKE '1/%' AND has_tree

To get only "unique" pats you can use the DISTINCT keyword:

SELECT DISTINCT pat
FROM tab
WHERE path LIKE '1/%'
  AND has_tree

That's it.

If you don't know the roots path but only the pat value, you would either need to run two queries (first get the path then get the descendants) or use a join.

PostgreSQL:

SELECT DISTINCT t.pat
FROM tab t
JOIN tab r ON t.path LIKE r.path || '%'
WHERE r.pat = 'AA' -- input
  AND t.has_tree

Demo: http://rextester.com/EXZT43019

MySQL:

SELECT DISTINCT t.pat
FROM tab t
JOIN tab r ON t.path LIKE CONCAT(r.path, '%')
WHERE r.pat = 'AA' -- input
  AND t.has_tree

Demo: http://rextester.com/DNHRJ83456

Note: As you will see in the domo, you can also use pipes (||) in MySQL for concatenation if you change the mode with:

SET sql_mode=PIPES_AS_CONCAT;

Regarding the index: An index on a boolean column is usually not very usefull. However you can only know for sure if you test it. For the given queries a composite index on (has_tree, path) might improve the performance.