UPDATE SQL FIDDLE link https://www.db-fiddle.com/f/9t63on5kYWUNrHqXkThb1P/4
Output should give only the following (row 2 of table1)
I0016,I0028,I0045,I0056,I0215,I0321,I0361,I0369,I0420
I should
select column1 from table1
where <any comma separated value in column1> not in
(select col2 from table2 where col1 = 'e')
Preferable solution is native SQL, and nothing vendor specific. If necessary, spark sql functions help.
NOTE: I understand this is bad design, but this is out of my hands.
NOTE The table in FIDDLE is created using default setting of MySQL. I do not know how the tables are created at the backend. That is why I am specifying that this should not be vendor specific.
You should fix your data structure! Storing lists of values in a single column is not the appropriate way to store data in SQL. You should be using a junction/association table.
You can do what you want using not exists
:
select t1.column1
from table1 t1
where not exists (select 1
from table2 t2
where ',' || t1.column || ',' like '%,' || t2.value || ',%'
)
SparkSQL may also support find_in_set()
, in which case you can do:
select t1.column1
from table1 t1
where not exists (select 1
from table2 t2
where find_in_set(t1.column, t2.value) > 0
)
Here is a db<>fiddle.
The following correlated subquery works for my case. Can test it out in the fiddle above.
select *
from table1 as t1
where (
select t2.col2
from table2 as t2
where t1.column1 like concat('%', t2.col2 ,'%')
and t2.col1 = 'e'
limit 1
) is NULL;