Batch indexing Spring Data JPA entries to Elastic

2019-07-23 06:24发布

Our current setup is MySQL as main data source through Spring Data JPA, with Hibernate Search to index and search data. We now decided to go to Elastic Search for searching to better align with other features, besides we need to have multiple servers sharing the indexing and searching.

I'm able to setup Elastic using Spring Data ElasticSearch for data indexing and searching easily, through ElasticsearchRepository. But the challenge now is how to index all the existing MySQL records into Elastic Search. Hibernate Search provides an API to do this org.hibernate.search.jpa.FullTextEntityManager#createIndexer which we use all the time. But I cannot find a handy solution within Spring Data ElasticSearch. Hope somebody can help me out here or provide some pointers.

There is a similar question here, however the solution proposed there doesn't fit my needs very well as I'd prefer to be able to index a whole object, which fields are mapped to multiple DB tables.

1条回答
我命由我不由天
2楼-- · 2019-07-23 07:15

So far I haven't found a better solution than writing my own code to index all JPA entries to ES inside my application, and this one worked out for me fine

Pageable page = new PageRequest(0, 100);
Page<Instance> curPage = instanceManager.listInstancesByPage(page);    //Get data by page from JPA repo.
long count = curPage.getTotalElements();
while (!curPage.isLast()) {
    List<Instance> allInstances = curPage.getContent();
    for (Instance instance : allInstances) {
        instanceElasticSearchRepository.index(instance);    //Index one by one to ES repo.
    }
    page = curPage.nextPageable();
    curPage = instanceManager.listInstancesByPage(page);
}

The logic is very straightforward, just depending on the quantity of the data it might take a while, so breaking down to batches and adding some messages can be helpful.

查看更多
登录 后发表回答