How can I speed up fetching the results after runn

2019-01-26 19:31发布

问题:

As an answer on my question: Is it normal that sqlite.fetchall() is so slow? it seems that fetch-all and fetch-one can be incredibly slow for sqlite.

As I mentioned there, I have the following query:

time0 = time.time()
self.cursor.execute("SELECT spectrum_id, feature_table_id "+
                "FROM spectrum AS s "+
                "INNER JOIN feature AS f "+
                "ON f.msrun_msrun_id = s.msrun_msrun_id "+
                "INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax "+
                             "FROM convexhull GROUP BY feature_feature_table_id) AS t "+
                "ON t.feature_feature_table_id = f.feature_table_id "+
                "WHERE s.msrun_msrun_id = ? "+
                "AND s.scan_start_time >= t.rtMin "+
                "AND s.scan_start_time <= t.rtMax "+
                "AND base_peak_mz >= t.mzMin "+
                "AND base_peak_mz <= t.mzMax", spectrumFeature_InputValues)
print 'query took:',time.time()-time0,'seconds'

time0 = time.time()
spectrumAndFeature_ids = self.cursor.fetchall()      
print time.time()-time0,'seconds since to fetchall'

The execution of the select statement takes about 50 seconds (acceptable). However, the fetchall() takes 788 seconds, only fetching 981 results.

The way proposed to speed up the query given as answer to my question: Is it normal that sqlite.fetchall() is so slow? using fetchmany(), has not improved the speed of fetching the results.

How can I speed up fetching the results after running an sqlite query?


The sql exactly as I tried to execute it on command line:

sqlite> SELECT spectrum_id, feature_table_id
   ...> FROM spectrum AS s 
   ...> INNER JOIN feature AS f 
   ...> ON f.msrun_msrun_id = s.msrun_msrun_id 
   ...> INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax 
   ...> FROM convexhull GROUP BY feature_feature_table_id) AS t 
   ...> ON t.feature_feature_table_id = f.feature_table_id 
   ...> WHERE s.msrun_msrun_id = 1
   ...> AND s.scan_start_time >= t.rtMin
   ...> AND s.scan_start_time <= t.rtMax
   ...> AND base_peak_mz >= t.mzMin
   ...> AND base_peak_mz <= t.mzMax;

update:

So I started running the query on the commandline about 45 minutes ago, and it's still busy, so it's also very slow using the commandline.

回答1:

From reading this question, it sounds like you could benefit from using the APSW sqlite module. Somehow you may be victim of your sqlite module causing your query to be executed in some less performant manner.

I was curious so I tried using apsw myself. It wasn't too complicated. Why don't you give it a try?

To install it I had to:

  1. Extract the latest version.
  2. Have the installation package fetch the latest sqlite amalgamation.

    python setup.py fetch --sqlite
    
  3. Build and install.

    sudo python setup.py install
    
  4. Use it in place of the other sqlite module.

    import apsw
    <...>
    conn = apsw.Connection('foo.db')