I have an sql query that selects from two inner joined tables. The execution of the select statement takes about 50 seconds. However, the fetchall() takes 788 seconds and it only fetches 981 results. This is the query and fetchall code:
time0 = time.time()
self.cursor.execute("SELECT spectrum_id, feature_table_id "+
"FROM spectrum AS s "+
"INNER JOIN feature AS f "+
"ON f.msrun_msrun_id = s.msrun_msrun_id "+
"INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax "+
"FROM convexhull GROUP BY feature_feature_table_id) AS t "+
"ON t.feature_feature_table_id = f.feature_table_id "+
"WHERE s.msrun_msrun_id = ? "+
"AND s.scan_start_time >= t.rtMin "+
"AND s.scan_start_time <= t.rtMax "+
"AND base_peak_mz >= t.mzMin "+
"AND base_peak_mz <= t.mzMax", spectrumFeature_InputValues)
print 'query took:',time.time()-time0,'seconds'
time0 = time.time()
spectrumAndFeature_ids = self.cursor.fetchall()
print time.time()-time0,'seconds since to fetchall'
Is there a reason why the fetchall takes so long?
update
Doing:
while 1:
info = self.cursor.fetchone()
if info:
<do something>
else:
break
is going just as slow as
allInfo = self.cursor.fetchall()
for info in allInfo:
<do something>
By default
fetchall()
is as slow as looping overfetchone()
due to thearraysize
of theCursor
object being set to 1.To speed things up you can loop over
fetchmany()
, but to see a performance gain, you need to provide it with a size parameter bigger than 1, otherwise it'll fetch "many" by batches ofarraysize
, i.e. 1.It is quite possible that you can get the performance gain simply by raising the value of
arraysize
, but I have no experience doing this, so you may want to experiment with that first by doing something like:More on the above here: http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.fetchmany