SQLite table taking time to fetch the records in L

2019-05-30 08:21发布

问题:

Scenario: database is sqlite (need to encrypt records in database. Hence used SQL cipher API for iOS)

There is a table in the database named partnumber with schema as follows:

CREATE TABLE partnumber (
  objid varchar PRIMARY KEY,
  description varchar,
  make varchar,
  model varcha,
  partnumber varchar,
  SSOKey varchar,
  PMOKey varchar
)

This table contains approximately 80K records.

There are 3 text fields in the UI view, in which user can enter search terms and searching is made as soon as user enters the letters there.

3 text fields are: txtFieldDescription, txtFieldMake and txtFieldModel.

Suppose, first user enters the search term as ‘monitor’ in txtFieldDescription. So, the queries that will be executed with each letter are:

1.

    SELECT DISTINCT description COLLATE NOCASE
    FROM partnumber where description like ‘%m%’

2.

    SELECT DISTINCT description COLLATE NOCASE
    FROM partnumber where description like ‘%mo%’

3.

    SELECT DISTINCT description COLLATE NOCASE
    FROM partnumber where description like ‘%mon%’

4.

    SELECT DISTINCT description COLLATE NOCASE
    FROM partnumber where description like ‘%moni%’

5.

    SELECT DISTINCT description COLLATE NOCASE
    FROM partnumber where description like ‘%monit%’

6.

    SELECT DISTINCT description COLLATE NOCASE
    FROM partnumber where description like ‘%monito%’

7.

    SELECT DISTINCT description COLLATE NOCASE
    FROM partnumber where description like ‘%monitor%’

So far so good. Suppose now if user wants to search for model (txtFieldDescription still contains ‘monitor’). So user clicks on txtFieldModel. As soon as user clicks on model, a query is fired as:

SELECT DISTINCT model COLLATE NOCASE
FROM partnumber where description like ‘%monitor%’

This query will return all the models for the records where description contains monitor (at any position).

Now, if user wants to search for all the models containing word ‘sony’ (description field still contains monitor), then the queries that will get executed with each letter are:

1.

    SELECT DISTINCT model COLLATE NOCASE
    FROM partnumber WHERE model like ‘%s%’ AND description like ‘%monitor%’

2.

    SELECT DISTINCT model COLLATE NOCASE
    FROM partnumber WHERE model like ‘%so%’ AND description like ‘%monitor%’

3.

    SELECT DISTINCT model COLLATE NOCASE
    FROM partnumber WHERE model like ‘%son%’ AND description like ‘%monitor%’

4.

    SELECT DISTINCT model COLLATE NOCASE
    FROM partnumber WHERE model like ‘%sony%’ AND description like ‘%monitor%’

Now, if user clicks on txtFieldMake and enters search term as ‘1980’, then the queries that get fired are:

1.

    SELECT DISTINCT make COLLATE NOCASE
    FROM partnumber WHERE make like ‘%1%’
      AND model like ‘%sony%’ AND description like ‘%monitor%’

2.

    SELECT DISTINCT make COLLATE NOCASE
    FROM partnumber WHERE make like ‘%19%’
      AND model like ‘%sony%’ AND description like ‘%monitor%’

3.

    SELECT DISTINCT make COLLATE NOCASE
    FROM partnumber WHERE make like ‘%198%’
      AND model like ‘%sony%’ AND description like ‘%monitor%’

4.

    SELECT DISTINCT make COLLATE NOCASE
    FROM partnumber WHERE make like ‘%1980%’
      AND model like ‘%sony%’ AND description like ‘%monitor%’

Here, the time delay in making a transition from txtFieldDescription to txtFieldModel or txtFieldModel to txtFieldMake is too large and in txtFieldModel and txtFieldMake, the letter entered are shown after 5 or 6 secs (after the query has been processed) and hence the cursor hangs there.

On analyzing, I came to know that wildcard before the searching term in like keyword (as in ‘%monitor%’) slows the execution. And in this case there may be as many as 3 such like keywords with AND in between them and hence, execution time is sure to get increase. Also, use of wildcard at the beginning of like negates the indices.

A FEW ADDITIONAL INFORMATION:

  1. The total number of records ~80K

  2. SELECT query is run each time on the table partnumber (~80K)

  3. Result of some queries performed by me:

    Sqlite> SELECT count(DISTINCT description COLLATE NOCASE) from partnumber;
        Result is: 2599
    
    Sqlite> SELECT count(DISTINCT make COLLATE NOCASE) from partnumber;
        Result is: 7129
    
    Sqlite> SELECT count(DISTINCT model COLLATE NOCASE) from partnumber;
        Result is: 64644
    
    Sqlite> SELECT count(objid) from partnumber;
        Result is: 82135
    
  4. Indices are created as follows:

    CREATE INDEX index_description
    ON partnumber (description collate nocase)
    
    CREATE INDEX index_make
    ON partnumber (make collate nocase)
    
    CREATE INDEX index_model
    ON partnumber (model collate nocase)
    

SOME ALTERNATIVES TO INCREASE PERFORMANCE:

  1. Since the count of distinct description is only 2599 and that of make is only 7129, so the table can be split into different tables with one containing DISTINCT description COLLATE NOCASE output (total of 2599 rows) and one containing DISTINCT make COLLATE NOCASE (total of 7129 rows). As far as model is concerned, making a different table for it will not help as the number of rows ~64644 is nearly equal to total records ~82135. But the problem with this approach is that I don’t know how I would be making a search in these tables, what columns must be there in each of them and how many tables must be created. What if user enters some description then enters model and then again enters a new description.

  2. Since the result of this select query is being displayed in a UITableView and the user sees at max 5 rows at a time. So, we can limit the number of rows that are being returned to 500 and when user scrolls, then next 500 can be fetched and so on till the last searched record.

But the problem here is although I need only 500 records, but I will have to search entire table (SCAN ~80K records). So, I need a query that will first search only top 10% of the table and return top 500 rows from this, then next 500 till top 10% records are all searched, then next 10%, then next 10% till 80000 records are being searched (need to search in chunks of 10-10% records).

  1. If the table of 80K records can be split into 4 tables of 20K records each and then searching is performed on all 4 tables simultaneously (in different background threads) to get the result set. But here I don’t know how to run the queries in 4 different threads (sort of batch execution), when to combine the results and how to know that all the threads have finished execution.

  2. If I can replace like %monitor%‘ with another function that returns the same result but whose execution is faster and the use of that function does not affects the use of index, (that is, does not by-passes the use of index), then the execution may get faster. If anyone can suggest me such a function in sqlite, then I can go on with this approach.

If you can help me to implement any one of these alternatives, or if you can suggest me any other solution, then I would be able to increase the execution speed of my query. And pls dont tell me to enable FTS (Full Text Searching) in sqlite because I have already tried doing this, but I dont know the exact steps. Thanks a lot for reading this question so patiently......

EDIT:

Hey All, I got some success. I modified my select query to look like this:

select distinct description collate nocase as description from partnumber where rowid BETWEEN 1 AND (select max(rowid) from partnumber) AND description like '%a%' order by description;

And Bingo, the search time was like never before. But the problem now is when I execute the command EXPLAIN QUERY PLAN like this, it shows me using B-Tree for distinct which I dont want to use.

explain query plan select distinct description collate nocase as description from partnumber where rowid BETWEEN 1 AND (select max(rowid) from partnumber) AND description like '%a%' order by description;

Output:

0|0|0|SEARCH TABLE partnumber USING INTEGER PRIMARY KEY (rowid>? AND rowid<?) (~15625 rows)
0|0|0|EXECUTE SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE partnumber USING INTEGER PRIMARY KEY (~1 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT

EDIT:

Sorry guys. The above approach (using rowid for searching) is taking more time on device than the original one. I have tried removing the distinct and order by keywords, but it was of no use. Still taking ~8-10 secs on iPhone. Pls help me out.

回答1:

Anshul,

I know you said "pls dont tell me to enable FTS (Full Text Searching) in sqlite because I have already tried doing this, but I dont know the exact steps", however FTS is the only way you are going to get this to perform well. There is no magic that will make a full table scan perform well. I's suggest reading up on FTS, taking the time to learn it, and then use it: http://sqlite.org/fts3.html.