Slow insert and saveall performance on spring data

2020-05-06 05:00发布

问题:

I am trying to insert 1500 records using spring into cassandra. I have a list of POJOs which hold these 1500 records and when I call saveAll or insert on this data it takes 30 seconds to complete this operation. Can someone suggest a way for me to get this done faster? I am currently running Cassandra 3.11.2 as a single node Test cluster.

Entity POJO:

package com.samplepoc.pojo;

import static org.springframework.data.cassandra.core.cql.PrimaryKeyType.PARTITIONED;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;

import org.springframework.data.cassandra.core.mapping.Column;
import org.springframework.data.cassandra.core.mapping.PrimaryKeyColumn;
import org.springframework.data.cassandra.core.mapping.Table;

@Table("health")
public class POJOHealth
{
    @PrimaryKeyColumn(type=PARTITIONED)
    UUID primkey;
    @Column
    String col1;
    @Column
    String col2;
    @Column
    String col3;
    @Column
    String col4;
    @Column
    String col5;
    @Column
    Date ts;
    @Column
    boolean stale;
    @Column
    String col6;
    @Column
    String col7;
    @Column
    String col8;
    @Column
    String col9;
    @Column
    Map<String,String> data_map = new HashMap<String,String>();

    public POJOHealth(
             String col1,
             String col2,
             String col3,
             String col4,
             String col5,
             String col6,
             String col7,
             String col8,
             String col9,
             boolean stale,
             Date ts,
             Map<String,String> data_map
             )
    {
        this.primkey = UUID.randomUUID();
        this.col1=col1;
        this.col2=col2;
        this.col3=col3;
        this.col4=col4;
        this.col5=col5;
        this.col6=col6;
        this.col7=col7;
        this.col8=col8;
        this.col9=col9;
        this.ts=ts;
        this.data_map = data_map;
        this.stale=stale;
    }

    //getters & setter ommitted
}

Persist Service snippet:

public void persist(List<POJO> l_POJO)
{
        System.out.println("Enter Persist: "+new java.util.Date());

        List<l_POJO> l_POJO_stale = repository_name.findBycol1AndStale("sample",false);
        System.out.println("Retrieve Old: "+new java.util.Date());

        l_POJO_stale.forEach(s -> s.setStale(true));
        System.out.println("Set Stale: "+new java.util.Date());

        repository_name.saveAll(l_POJO_stale);
        System.out.println("Save stale: "+new java.util.Date());

        try 
        {
            repository_name.insert(l_POJO);
        } 
        catch (Exception e) 
        {
            System.out.println("Error in persisting new data");
        }
        System.out.println("Insert complete: "+new java.util.Date());
}

回答1:

I dont know about spring, but the java driver that its using can do the inserts asynchronously. With you saving in this way the latency to your instance dictates your throughput - not the efficiency of your query. ie assume you have a 10ms latency to the C* coordinator, saving one at a time thats going to take 30 seconds (10ms there 10ms back * 1,500).

If you insert all of them with executeAsync at same time and block on them all completing you should be able to do 1500 in less than a second unless your hardware is very under powered (pretty much anything more than a raspberry pi should be able to handle that in bursts at least). That said if your app has any concurrency you don't want each sending 1000 inserts at same time so putting some kind of in flight throttle (ie a Semaphore with a 128 limit) would be a very good idea.