I have a query where I am searching against a string:
SELECT county FROM city WHERE UPPER(name) = 'SAN FRANCISCO';
Now, this works fine, but it doesn't scale well, and I need to optimize it. I have found an option along the lines of creating a generated view, or something like that, but I was hoping for a simpler solution using an index.
We are using DB2, and I really want to use an expression in an index, but this option seems to only be available on z/OS, however we are running Linux. I tried the expression index anyways:
CREATE INDEX city_upper_name_idx
ON city UPPER(name) ALLOW REVERSE SCANS;
But of course, it chokes on the UPPER(name).
Is there another way I can create an index or something similar in this manner such that I don't have to restructure my existing queries to use a new generated view, or alter my existing columns, or any other such intrusive change?
EDIT: I'm open to hearing solutions for other databases... it might carry over to DB2...
You could add an indexed column holding a numerical hash key of the city name. (With duplicates allowed).
Then you could do a multi-clause where :
Alternatively, go through your db manual and look at the options for creating table indexes. There might be something helpful.
PostgreSQL also supports indexing the results of a function:
The only other option I can think of is to de-normalize your data a bit by creating another column to hold the upper-case version (updated by triggers) and index that. Blech!
I don't know whether this would work in DB2, but I'll tell you how I'd do this in SQL Server. I think the way MSSQL does this is ANSI standard, though the specific collation strings may differ. Anyway, if you can do this without trashing the rest of your application -- are there other places where the "name" column needs to be case-sensitive? -- try making that whole column case-insensitive by changing the collation, then index the column.
...where "nvarchar(200)" stands in for whatever's your current column data type. The "CI" part of the collation string is what marks it as case-insensitive in MSSQL.
To explain... my understanding is that the index will store values in the order of the indexed column's collation. Making the column's collation be case-insensitive would make the index store 'San Francisco', 'SAN FRANCISCO', and 'san francisco' all together. Then you should just have to remove the "UPPER()" from your query, and DB2 should know that it can use your index.
Again, this is based solely on what I know about SQL Server, plus a couple minutes looking at the SQL-92 spec; it may or may not work for DB2.
Oracle supports function-based indexes. Their canonical example:
Short answer, no.
Long answer, yes if you're running on the mainframe, but you're not, so you have to use other trickery.
DB2 (as of DB2/LUW v8) now has generated columns so you can:
and then create an index on ulname. I'm not sure you're going to get it simpler than that.
Before that, you used to have to use a combination of insert and update triggers to ensure the ulname column was kept in sync, and this was a nightmare to maintain. Also, now that this functionality is part of the core DBMS, it's been highly optimized (it's much faster than the trigger-based solution) and doesn't get in the way of real user triggers, so no extra DB objects to maintain.
See here for details.
DB2 isn't strong regarding collation. And it doesn't have function-based indexes.
Niek Sanders's suggestion would work, if you can accept that the hashing has to happen in your application (as DB2 doesn't have SHA or MD5 functions, as far as I know).
However, if I were you, I'd create a materialized view (MQT == Materialized Query Table, in db2 parlance) using CREATE TABLE AS, adding a column with a pre-computed upper-case variant of the name. Note: You may add indexes to materialized views in DB2.