I'm having a performance problem in SQLite with a SELECT COUNT(*) on a large tables.
As I didn't yet receive a usable answer and I did some further testing, I edited my question to incorporate my new findings.
I have 2 tables:
CREATE TABLE Table1 (
Key INTEGER NOT NULL,
... several other fields ...,
Status CHAR(1) NOT NULL,
Selection VARCHAR NULL,
CONSTRAINT PK_Table1 PRIMARY KEY (Key ASC))
CREATE Table2 (
Key INTEGER NOT NULL,
Key2 INTEGER NOT NULL,
... a few other fields ...,
CONSTRAINT PK_Table2 PRIMARY KEY (Key ASC, Key2 ASC))
Table1 has around 8 million records and Table2 has around 51 million records, and the databasefile is over 5GB.
Table1 has 2 more indexes:
CREATE INDEX IDX_Table1_Status ON Table1 (Status ASC, Key ASC)
CREATE INDEX IDX_Table1_Selection ON Table1 (Selection ASC, Key ASC)
"Status" is required field, but has only 6 distinct values, "Selection" is not required and has only around 1.5 million values different from null and only around 600k distinct values.
I did some tests on both tables, you can see the timings below, and I added the "explain query plan" for each request (QP). I placed the database file on an USB-memorystick so i could remove it after each test and get reliable results without interference of the disk cache. Some requests are faster on USB (I suppose due to lack of seektime), but some are slower (table scans).
SELECT COUNT(*) FROM Table1
Time: 105 sec
QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Selection(~1000000 rows)
SELECT COUNT(Key) FROM Table1
Time: 153 sec
QP: SCAN TABLE Table1 (~1000000 rows)
SELECT * FROM Table1 WHERE Key = 5123456
Time: 5 ms
QP: SEARCH TABLE Table1 USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1
Time: 16 sec
QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Status (Status=?) (~3 rows)
SELECT * FROM Table1 WHERE Selection = 'SomeValue' AND Key > 5123456 LIMIT 1
Time: 9 ms
QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Selection (Selection=?) (~3 rows)
As you can see the counts are very slow, but normal selects are fast (except for the 2nd one, which took 16 seconds).
The same goes for Table2:
SELECT COUNT(*) FROM Table2
Time: 528 sec
QP: SCAN TABLE Table2 USING COVERING INDEX sqlite_autoindex_Table2_1(~1000000 rows)
SELECT COUNT(Key) FROM Table2
Time: 249 sec
QP: SCAN TABLE Table2 (~1000000 rows)
SELECT * FROM Table2 WHERE Key = 5123456 AND Key2 = 0
Time: 7 ms
QP: SEARCH TABLE Table2 USING INDEX sqlite_autoindex_Table2_1 (Key=? AND Key2=?) (~1 rows)
Why is SQLite not using the automatically created index on the primary key on Table1 ? And why, when he uses the auto-index on Table2, it still takes a lot of time ?
I created the same tables with the same content and indexes on SQL Server 2008 R2 and there the counts are nearly instantaneous.
One of the comments below suggested executing ANALYZE on the database. I did and it took 11 minutes to complete. After that, I ran some of the tests again:
SELECT COUNT(*) FROM Table1
Time: 104 sec
QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Selection(~7848023 rows)
SELECT COUNT(Key) FROM Table1
Time: 151 sec
QP: SCAN TABLE Table1 (~7848023 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1
Time: 5 ms
QP: SEARCH TABLE Table1 USING INTEGER PRIMARY KEY (rowid>?) (~196200 rows)
SELECT COUNT(*) FROM Table2
Time: 529 sec
QP: SCAN TABLE Table2 USING COVERING INDEX sqlite_autoindex_Table2_1(~51152542 rows)
SELECT COUNT(Key) FROM Table2
Time: 249 sec
QP: SCAN TABLE Table2 (~51152542 rows)
As you can see, the queries took the same time (except the query plan is now showing the real number of rows), only the slower select is now also fast.
Next, I create dan extra index on the Key field of Table1, which should correspond to the auto-index. I did this on the original database, without the ANALYZE data. It took over 23 minutes to create this index (remember, this is on an USB-stick).
CREATE INDEX IDX_Table1_Key ON Table1 (Key ASC)
Then I ran the tests again:
SELECT COUNT(*) FROM Table1
Time: 4 sec
QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Key(~1000000 rows)
SELECT COUNT(Key) FROM Table1
Time: 167 sec
QP: SCAN TABLE Table2 (~1000000 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1
Time: 17 sec
QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Status (Status=?) (~3 rows)
As you can see, the index helped with the count(*), but not with the count(Key).
Finaly, I created the table using a column constraint instead of a table constraint:
CREATE TABLE Table1 (
Key INTEGER PRIMARY KEY ASC NOT NULL,
... several other fields ...,
Status CHAR(1) NOT NULL,
Selection VARCHAR NULL)
Then I ran the tests again:
SELECT COUNT(*) FROM Table1
Time: 6 sec
QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Selection(~1000000 rows)
SELECT COUNT(Key) FROM Table1
Time: 28 sec
QP: SCAN TABLE Table1 (~1000000 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1
Time: 10 sec
QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Status (Status=?) (~3 rows)
Although the query plans are the same, the times are a lot better. Why is this ?
The problem is that ALTER TABLE does not permit to convert an existing table and I have a lot of existing databases which i can not convert to this form. Besides, using a column contraint instead of table constraint won't work for Table2.
Has anyone any idea what I am doing wrong and how to solve this problem ?
I used System.Data.SQLite version 1.0.74.0 to create the tables and to run the tests I used SQLiteSpy 1.9.1.
Thanks,
Marc
The output for the fast queries all start with the text "QP: SEARCH". Whilst those for the slow queries start with text "QP: SCAN", which suggests that sqlite is performing a scan of the entire table in order to generate the count.
Googling for "sqlite table scan count" finds the following, which suggests that using a full table scan to retrieve a count is just the way sqlite works, and is therefore probably unavoidable.
As a workaround, and given that status has only eight values, I wondered if you could get a count quickly using a query like the following?
select 1 where status=1 union select 1 where status=2 ...
then count the rows in the result. This is clearly ugly, but it might work if it persuades sqlite to run the query as a search rather than a scan. The idea of returning "1" each time is to avoid the overhead of returning real data.
If you haven't
DELETE
d any records, doing:Will avoid the full-table scan. Note that
_ROWID_
is a SQLite identifier.