I have a field in my table having text data type.
Is there a difference in performance for the following two sql queries:
select * from tablename where fieldname="xyz%";
select * from tablename where fieldname="%zyx";
If we were to implement the execution of these queries, this is what I think we would need to do:
We have to match the two regexes (xyz* and *zyx).
We will have to check the string chars one by starting from the beginning.
For the first query we will have to read the first three characters to see if there is a match but for the second one we will have to read till the we get the end of the string to determine if the match has occurred. But if we have the length of the string stored somewhere we can directly read the last three characters giving similar performance as the first case.
My question is whether commercial databases like mysql and oracle show any difference in the performance in the execution of the queries.
Yes, there is a difference between the following two queries:
LIKE
fieldname
column, "%xyz" can not use the index - but"xyz%" could, which means it would be faster.The fastest means of finding substrings within text is to use Full Text Search (FTS) - both Oracle and MySQL have their own native functionality, and there are 3rd party tools like Sphinx and Solr.
There is definitely difference between performance on all DB's. First case will be definitely faster if column is indexed.
I had similar instance in my project where user was also allowed to search "ends with" (like your second query).
As this was frequently used operation and query was slow,
so your second query becomes:
This approach made it as fast as starts with query.
If
fieldname
is indexed, most of commercial databases can transform the first query into an interval searchwhich is very fast.
Picking up from your comment : " I just want to know if a starts with match is diff from an ends with match".
Firstly - remember that we are not looking for the best algorithm to match a string. We are looking for the best algorithm to find all matching strings in a set of N rows. We want to do better than 'Do algorithm X, N times'.
If fieldname is NOT indexed, then there will be very little difference in performance between the two queries - the SQL engine is just going to do a match on the first 3 or last 3 bytes of the string, which is simply a matter of offsetting to the right memory location.
If the fieldname IS indexed, there will be a huge difference in performance between the two searches, because rather than examining all N rows, we can discard most of the data.
i.e. for the "xyz%" version, we can use a binary search.
We start at the middle element, which happens to be 'peter'. We can immediately discard everything before 'peter' and get the middle element on the remainder - 'samantha', and so on, until we find the entries starting 'xyz'.
With the "%xyz" version, we cannot do this, as ANY string could potentially match at the end, we need to look at every string.
As the size of our table expands, the difference between these two approaches becomes large.
The solution of creating a field/index for the reverse of fieldname allows us to use the binary search technique again. (In some databases it is actual possible to do this without creating an extra field, but through using particular index types, virtual columns, etc).
This is simplified a lot - for detail on the actual implementation of database indexes, look into B-Tree and B*Tree indexes.