-->

Best practices for using SQLite3 + Node.js

2019-02-02 09:54发布

问题:

I've got a modest Node.js script that pulls down data from Wikipedia via the API and stores it in a SQLite database. I'm using this node-sqlite3 module.

In some cases, I'm pulling down data on upward of 600,000 articles and storing some metadata about each one in a row in the db. The articles are retrieved in groups of 500 from the API.

The request that retrieves the JSON object with the data on the 500 articles passes the object to this callback:

//(db already instantiated as 'new sqlite.Database("wikipedia.sqlite");')

function callback(articles) {
    articles.forEach(function(article) {
        db.run("INSERT OR IGNORE INTO articles (name, id, created) VALUES (?,?,?)", [article["title"], article["pageid"], article["timestamp"]]);
    });
}

The modules operates by default in parallel, but the documentation for node-sqlite3 includes one example of serial operations like so:

db.serialize(function() {
  db.run("CREATE TABLE lorem (info TEXT)");

  var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
  for (var i = 0; i < 10; i++) {
    stmt.run("Ipsum " + i);
  }
  stmt.finalize();
}

I tried to imitate this and saw almost no performance difference. Am I doing it wrong? Right now, the data retrieves from the API much faster than it writes to the DB, though it's not intolerably slow. But pummeling the DB with 600K individual INSERT commands feels clumsy.

UPDATE: Per accepted answer, this appears to work for node-sqlite3, in lieu of a native solution. (See this Issue).

    db.run("BEGIN TRANSACTION");
function callback(articles) {
        articles.forEach(function(article) {
            db.run("INSERT OR IGNORE INTO articles (name, id, created) VALUES (?,?,?)", [article["title"], article["pageid"], article["timestamp"]]);
        });
    }
    db.run("END");

回答1:

When you are doing several insertions into a SQLite database, you need to wrap the collection of insertions into a transaction. Otherwise, SQLite will wait for the disk platters to spin completely around for each insert, while it does a read-after-write verify for each record that you insert.

At 7200 RPM, it takes about 1/60th of a second for the disk platter to spin around again, which is an eternity in computer time.