sqlite: Fastest way to get all rows (consecutive d

2019-03-30 23:58发布

问题:

I want to read all rows in a table using system.data.sqlite. As I have a very big Table (>450GB, with > 6 billion rows) I want to be sure that sqlite will use consequtive disk access. As you may know a random access to hard disk is slow. Due to memory restictions I can not load all the data at once. So the optimal way would be if sqlite reads some hundred MB (consecutive), then I work with this data and sqlite reads the next.

How can I be sure that sqlite will do disk access this way and not jumping from one position on hard disk to another?

Things I know (I think these suggestions will show up):

  • It could be better to use an other DBMS. But I want/need to solve it wiht this one.
  • I know that the disk head will be positioned by the OS on other data while I am workign with thhe data. This doesn't matter. It is just that some hundret MB will be read consecutively.
  • I don't want to / can split the database file up into smaller pieces

I found this post, but it is not addressing my problem correctly:
Which is the fastest way to retrieve all items in SQLite?

回答1:

That's what clustered index are for. sqlite doesn't support them though.

The following is copied from: http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows

Four: Clustered Indexes

SQLite doesn't support clustered indexes (simply, indexes that force the data in the database to be physically laid down in the SAME order as the index needs it to be in.)

This means that if your index is sequential INTEGER, the records are physically laid out in the database in that INTEGERs order, 1 then 2 then 3.

You can't make a Clustered index, but you CAN sort your data in order so that any historical data is ordered nicely. Of course, as the database matures, you lose that, but it helps

Someone else posted this, and it is a nice example to use, so I will. If you have a table WIBBLE whose field KEY you want to access a lot, it would be nice if everything was in order. Using the command line tool, you can create a fake cluster by doing the following:

create table wibble2 as select * from wibble;
delete from wibble;
insert into wibble select * from wibble2 order by key;
drop table wibble2;

Bottom line is you can re-order your records manually, but I imagine this would be practical for you only if you don't intend to write to the table to often.



回答2:

In SQLite, table rows are stored sorted by the rowid, so the most efficient way to read the rows in this order is to sort by this column (or the alias you declared with INTEGER PRIMARY KEY):

SELECT * FROM wibble ORDER BY rowid

For a WITHOUT ROWID table, you sort by the primary key column(s):

SELECT * FROM wibble ORDER BY MyPrimary, KeyColumns


回答3:

From @CL. answer in this post:

In SQLite, indexes created with CREATE INDEX are non-clustered indexes.

Since version 3.8.2, SQLite supports WITHOUT ROWID tables, which are clustered indexes.