I have an SQL table, with a list of packages (order of 10000 unique entries) and the corresponding categories of any given package (order of 100 unique entries). A given package can be part of more than one category (order of 15000 unique combinations, which is thus the size of the table).
All package names are supposed to be unique and not conflict with category names, however, that doesn't appear to be the case.
What I need is figure out if any of the 10k packages has the same name as any of the 100 categories, and, if so, amend all such package names to include a ,pkg
suffix, to ensure that access to the two namespaces could be shared, and remain unique.
The simplest case would be this:
sqlite> select * from categories where value=fullpkgpath;
lang/mono|lang/mono
sqlite> select fullpkgpath, value from categories where fullpkgpath="lang/mono";
lang/mono|devel
lang/mono|lang
lang/mono|lang/mono
Which should be converted into this:
lang/mono,pkg|devel
lang/mono,pkg|lang
lang/mono,pkg|lang/mono
However, there could also be cases where there is a more indirect conflict (a given package conflicting with a category that it's not part of), like this:
sqlite> select * from categories where fullpkgpath="lang/erlang" or fullpkgpath="lang/node";
lang/erlang|lang
lang/node|devel
lang/node|lang
lang/node|lang/python
Which should be converted into this:
lang/erlang,pkg|lang
lang/node,pkg|devel
lang/node,pkg|lang
lang/node,pkg|lang/python
Since both lang/erlang
and lang/node
are themselves categories, too:
sqlite> select * from categories where value="lang/erlang" limit 8;
databases/erl-Emysql|lang/erlang
databases/erl-couchbeam|lang/erlang
databases/erl-epgsql|lang/erlang
databases/erl-sqerl|lang/erlang
devel/erl-automeck|lang/erlang
devel/erl-bear|lang/erlang
devel/erl-depsolver|lang/erlang
devel/erl-ej|lang/erlang
sqlite> select * from categories where value="lang/node" limit 8;
databases/node-pg|lang/node
databases/node-sqlite3|lang/node
devel/node-async|lang/node
devel/node-bindings|lang/node
devel/node-buffer-writer|lang/node
devel/node-cloned|lang/node
devel/node-expresso|lang/node
devel/node-fibers|lang/node
I'm using sqlite3 and perl on OpenBSD. The database, if needed, is available on ftp.
To check whether a value is in a set of values, use IN: