I want to read all rows in a table using system.data.sqlite. As I have a very big Table (>450GB, with > 6 billion rows) I want to be sure that sqlite will use consequtive disk access. As you may know a random access to hard disk is slow. Due to memory restictions I can not load all the data at once. So the optimal way would be if sqlite reads some hundred MB (consecutive), then I work with this data and sqlite reads the next.
How can I be sure that sqlite will do disk access this way and not jumping from one position on hard disk to another?
Things I know (I think these suggestions will show up):
- It could be better to use an other DBMS. But I want/need to solve it wiht this one.
- I know that the disk head will be positioned by the OS on other data while I am workign with thhe data. This doesn't matter. It is just that some hundret MB will be read consecutively.
- I don't want to / can split the database file up into smaller pieces
I found this post, but it is not addressing my problem correctly:
Which is the fastest way to retrieve all items in SQLite?
That's what clustered index are for. sqlite doesn't support them though.
The following is copied from: http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows
Four: Clustered Indexes
SQLite doesn't support clustered indexes (simply, indexes that force
the data in the database to be physically laid down in the SAME order
as the index needs it to be in.)
This means that if your index is sequential INTEGER, the records are
physically laid out in the database in that INTEGERs order, 1 then 2
then 3.
You can't make a Clustered index, but you CAN sort your data in order
so that any historical data is ordered nicely. Of course, as the
database matures, you lose that, but it helps
Someone else posted this, and it is a nice example to use, so I will.
If you have a table WIBBLE whose field KEY you want to access a lot,
it would be nice if everything was in order. Using the command line
tool, you can create a fake cluster by doing the following:
create table wibble2 as select * from wibble;
delete from wibble;
insert into wibble select * from wibble2 order by key;
drop table wibble2;
Bottom line is you can re-order your records manually, but I imagine this would be practical for you only if you don't intend to write to the table to often.
In SQLite, table rows are stored sorted by the rowid, so the most efficient way to read the rows in this order is to sort by this column (or the alias you declared with INTEGER PRIMARY KEY):
SELECT * FROM wibble ORDER BY rowid
For a WITHOUT ROWID table, you sort by the primary key column(s):
SELECT * FROM wibble ORDER BY MyPrimary, KeyColumns
From @CL. answer in this post:
In SQLite, indexes created with CREATE INDEX are non-clustered
indexes.
Since version 3.8.2, SQLite supports WITHOUT ROWID tables, which are
clustered indexes.