Querying using subqueries on a SQLite database app

2019-08-27 03:54发布

(See update below)

I am having an issue of slow query performance when querying a very simplistic Sqlite datatable of about 500,000 rows from within a C#.Net application (~5sec).

I have tried the exact same query on exactly the same database using LinqPad, as well as 2 database browsers (both based on QtSql), and it runs 10x faster (~0.5secs). Same query, same db, different apps, only mine doesn't run fast.

It makes negligible difference whether I'm returning values or just a Count(*).

I've tried:

  • building for each of .Net 3.5/4/4.5
  • building for each of of AnyCPU/x86/x64
  • using each of System.Data.Sqlite, sqlite-net, as well as directly accessing a sqlite3 dll via COM
  • building for each of WPF/WinForms
  • different variations of the queries

None of these make any noticible difference to the query time.

I know that rewriting the query using JOINs may help, but what I can't figure out is why the same query works fine in LinqPad/Sql browers but not from any app I try to create. I must be missing something pretty fundamental.

Example Table:

"CREATE TABLE items(id INTEGER PRIMARY KEY, id1 INTEGER, id2 INTEGER, value INTEGER)"

Example Query String (though basically any query using a subquery takes a long time):

SELECT count(*) 
FROM items WHERE 
id2 IN 
(
    SELECT DISTINCT id2 FROM items WHERE id1 IN 
    (
        SELECT DISTINCT id1 FROM items WHERE id2 = 100000 AND value = 10
    )
    AND value = 10
) 
AND value = 10 
GROUP BY id2

I know this could probably be re-written using JOINS and indexing to speed it up, but the fact remains that this query works significantly faster from other apps. What am I missing here as to why the same query runs so much slower no matter what I try?

UPDATE: It seems the version of sqlite has something to do with the issue. Using the legacy System.Data.Sqlite v1.0.66.0 the query runs just like the other apps, however using a more recent version it is slow. I haven't pinpointed what at what version exactly this changed, but am pretty sure it's to do with the underlying sqlite3 version not System.Data.Sqlite specifically. If anyone knows what could have changed that would cause subqueries to slow down so much in this situation, or if there are settings or something that can make subqueries run faster in new versions of sqlite please let me know!

Again, the query is an example and is not ideal and partially redundant... the question is more about why it works in one and not the other.

Thanks in advance for any additional input!

UPDATE: SOLVED

See my answer below.

3条回答
老娘就宠你
2楼-- · 2019-08-27 04:15

Ok turns out it was to do with Automatic Indexing, which was introduced with SQLite 1.7.0. In my situation using subqueries on this kind of table without indexes meant that the time it took SQLite to create the automatic indexes was causing the additional overhead that the queries were experiencing.

The solution was to use:

PRAGMA automatic_index=OFF;

at the start of any query that uses the "IN" clause.

Creating indexes on the columns may also solve this (untested), however in this particular situation the additional size/disk usage necessary to create the indexes is not worth it.

This would also suggest that the LinqPad SQLite plugin and the database viewers I was using are based on old sqlite versions.

More information can be found at:

http://www.sqlite.org/src/info/8011086c85c6c4040

http://www.sqlite.org/optoverview.html#autoindex

Thanks to everyone that responded.

查看更多
Evening l夕情丶
3楼-- · 2019-08-27 04:27

Some suggestions:

You say you don't want to rework your queries nor add indexes. That is the obvious thing to do here. Without any indexes sqlite has to scan your 500,000 row table at least one time (or more likely multiple times).

Based on your query above I would add indexes to columns id1 and id2.

One other thing is that your query above seems a little redundant. Maybe you have your reasons, but I cannot see why the query should be so complicated. Simplified query:

select count(*)
from items
where id2 = 100000 and value = 10

查看更多
看我几分像从前
4楼-- · 2019-08-27 04:37

try

SELECT ID1.id2, count(*) 
FROM items ID2
JOIN items ID1
  on ID2.id2 = ID1.id1
 and ID1.id2 = 100000 
 and ID1.value = 10 
 and ID2.valu3 = 10
group by ID1.id2
查看更多
登录 后发表回答