(See update below)
I am having an issue of slow query performance when querying a very simplistic Sqlite datatable of about 500,000 rows from within a C#.Net application (~5sec).
I have tried the exact same query on exactly the same database using LinqPad, as well as 2 database browsers (both based on QtSql), and it runs 10x faster (~0.5secs). Same query, same db, different apps, only mine doesn't run fast.
It makes negligible difference whether I'm returning values or just a Count(*).
I've tried:
- building for each of .Net 3.5/4/4.5
- building for each of of AnyCPU/x86/x64
- using each of System.Data.Sqlite, sqlite-net, as well as directly accessing a sqlite3 dll via COM
- building for each of WPF/WinForms
- different variations of the queries
None of these make any noticible difference to the query time.
I know that rewriting the query using JOINs may help, but what I can't figure out is why the same query works fine in LinqPad/Sql browers but not from any app I try to create. I must be missing something pretty fundamental.
Example Table:
"CREATE TABLE items(id INTEGER PRIMARY KEY, id1 INTEGER, id2 INTEGER, value INTEGER)"
Example Query String (though basically any query using a subquery takes a long time):
SELECT count(*)
FROM items WHERE
id2 IN
(
SELECT DISTINCT id2 FROM items WHERE id1 IN
(
SELECT DISTINCT id1 FROM items WHERE id2 = 100000 AND value = 10
)
AND value = 10
)
AND value = 10
GROUP BY id2
I know this could probably be re-written using JOINS and indexing to speed it up, but the fact remains that this query works significantly faster from other apps. What am I missing here as to why the same query runs so much slower no matter what I try?
UPDATE: It seems the version of sqlite has something to do with the issue. Using the legacy System.Data.Sqlite v1.0.66.0 the query runs just like the other apps, however using a more recent version it is slow. I haven't pinpointed what at what version exactly this changed, but am pretty sure it's to do with the underlying sqlite3 version not System.Data.Sqlite specifically. If anyone knows what could have changed that would cause subqueries to slow down so much in this situation, or if there are settings or something that can make subqueries run faster in new versions of sqlite please let me know!
Again, the query is an example and is not ideal and partially redundant... the question is more about why it works in one and not the other.
Thanks in advance for any additional input!
UPDATE: SOLVED
See my answer below.
Ok turns out it was to do with Automatic Indexing, which was introduced with SQLite 1.7.0. In my situation using subqueries on this kind of table without indexes meant that the time it took SQLite to create the automatic indexes was causing the additional overhead that the queries were experiencing.
The solution was to use:
at the start of any query that uses the "IN" clause.
Creating indexes on the columns may also solve this (untested), however in this particular situation the additional size/disk usage necessary to create the indexes is not worth it.
This would also suggest that the LinqPad SQLite plugin and the database viewers I was using are based on old sqlite versions.
More information can be found at:
http://www.sqlite.org/src/info/8011086c85c6c4040
http://www.sqlite.org/optoverview.html#autoindex
Thanks to everyone that responded.
Some suggestions:
You say you don't want to rework your queries nor add indexes. That is the obvious thing to do here. Without any indexes sqlite has to scan your 500,000 row table at least one time (or more likely multiple times).
Based on your query above I would add indexes to columns
id1
andid2
.One other thing is that your query above seems a little redundant. Maybe you have your reasons, but I cannot see why the query should be so complicated. Simplified query:try