Insert 10,000,000+ rows in grails

2019-08-05 07:02发布

问题:

I've read a lot of articles recently about populating a grails table from huge data, but seem to have hit a ceiling. My code is as follows:

class LoadingService {
    def sessionFactory
    def dataSource
    def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP

    def insertFile(fileName) {
        InputStream inputFile = getClass().classLoader.getResourceAsStream(fileName)
        def pCounter = 1
        def mCounter = 1
        Sql sql = new Sql(dataSource)
        inputFile.splitEachLine(/\n|\r|,/) { line -> 
            line.each { value ->
                if(value.equalsIgnoreCase('0') { 
                    pCounter++
                    return
                }
                sql.executeInsert("insert into Patient_MRNA (patient_id, mrna_id, value) values (${pCounter}, ${mCounter}, ${value.toFloat()})")
                pCounter++
            }
            if(mCounter % 100 == 0) {
                cleanUpGorm()
            }
            pCounter = 1
            mCounter++
        }
    }

    def cleanUpGorm() {
        session.currentSession.clear()
        propertyInstanceMap.get().clear()
    }
}

I have disabled secondary cache, I'm using assigned ids, and I am explicitly handling this many to many relationship through a domain, not the hasMany and belongsTo.

My speed has increased monumentally after applying these methods, but after a while the inserts slow down to the point of almost stopping compared to about 623,000 per minute at the beginning.

Is there some other memory leak that I should be aware of or have I just hit the ceiling in terms of batch inserts in grails?

To be clear it takes about 2 minutes to insert 1.2 million rows, but then they start to slow down.

回答1:

Try doing batch inserts, it's much more efficient

def updateCounts = sql.withBatch { stmt ->
     stmt.addBatch("insert into TABLENAME ...")
     stmt.addBatch("insert into TABLENAME ...")
     stmt.addBatch("insert into TABLENAME ...")
     ...
 }


回答2:

I have fought with this in earlier versions of Grails. Back then I resorted to either simply run the batch manually in proper chunks or use another tool for the batch import, such as Pentaho Data Integration (or other ETL tool or DIY).