Is it normal that sqlite.fetchall() is so slow?

2020-02-11 09:30发布

问题:

I have an sql query that selects from two inner joined tables. The execution of the select statement takes about 50 seconds. However, the fetchall() takes 788 seconds and it only fetches 981 results. This is the query and fetchall code:

time0 = time.time()
self.cursor.execute("SELECT spectrum_id, feature_table_id "+
                    "FROM spectrum AS s "+
                    "INNER JOIN feature AS f "+
                    "ON f.msrun_msrun_id = s.msrun_msrun_id "+
                    "INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax "+
                                 "FROM convexhull GROUP BY feature_feature_table_id) AS t "+
                    "ON t.feature_feature_table_id = f.feature_table_id "+
                    "WHERE s.msrun_msrun_id = ? "+
                    "AND s.scan_start_time >= t.rtMin "+
                    "AND s.scan_start_time <= t.rtMax "+
                    "AND base_peak_mz >= t.mzMin "+
                    "AND base_peak_mz <= t.mzMax", spectrumFeature_InputValues)
print 'query took:',time.time()-time0,'seconds'

time0 = time.time()
spectrumAndFeature_ids = self.cursor.fetchall()      
print time.time()-time0,'seconds since to fetchall'

Is there a reason why the fetchall takes so long?


update

Doing:

while 1:
    info = self.cursor.fetchone()
    if info:
        <do something>
    else:
        break

is going just as slow as

allInfo = self.cursor.fetchall()         
for info in allInfo:
    <do something>

回答1:

By default fetchall() is as slow as looping over fetchone() due to the arraysize of the Cursor object being set to 1.

To speed things up you can loop over fetchmany(), but to see a performance gain, you need to provide it with a size parameter bigger than 1, otherwise it'll fetch "many" by batches of arraysize, i.e. 1.

It is quite possible that you can get the performance gain simply by raising the value of arraysize, but I have no experience doing this, so you may want to experiment with that first by doing something like:

>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> cu = conn.cursor()
>>> cu.arraysize
1
>>> cu.arraysize = 10
>>> cu.arraysize
10

More on the above here: http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.fetchmany