Can we make Lucene IndexWriter serializable for Ex

2019-09-08 15:49发布

问题:

This question is related to my another SO question.

To keep IndexWriter open for the duration of a partitioned step, I thought to add IndexWriter in ExecutionContext of partitioner and then close in a StepExecutionListenerSupport 's afterStep(StepExecution stepExecution) method.

Challenge that I am facing in this approach is that ExecutionContext needs Objects to be serializable.

In light of these two questions, Q1, Q2 -- it doesn't seem feasible because I can't add a no - arg constructor in my custom writer because IndexWriter doesn't have any no - arg constructor.

    public class CustomIndexWriter extends IndexWriter implements Serializable {
    /*
private Directory d;
    private IndexWriterConfig conf;


        public CustomIndexWriter(){
            super();
            super(this.d, this.conf);
        }
        */
        public CustomIndexWriter(Directory d, IndexWriterConfig conf) throws IOException {
            super(d, conf);
        }

        /**
         * 
         */
        private static final long serialVersionUID = 1L;

        private void readObject(ObjectInputStream input) throws IOException, ClassNotFoundException{
            input.defaultReadObject();
        }

        private void writeObject(ObjectOutputStream output) throws IOException, ClassNotFoundException {
            output.defaultWriteObject();
        }

    }

In above code, I can't add constructor shown as commented because no - arg constructor doesn't exist in Super class and can't access this fields before super .

Is there a way to achieve this?

回答1:

You can always add a parameter-less constructor.

E.g:

public class CustomWriter extends IndexWriter implements Serializable {
    private Directory lDirectory;
    private IndexWriterConfig iwConfig;

    public CustomWriter() {
        super();
        // Assign default values
        this(new Directory("." + System.getProperty("path.separator")), new IndexWriterConfig());
    }

    public CustomWriter(Directory dir, IndexWriterConfig iwConf) {
        lDirectory = dir;
        iwConfig = iwConf;
    }

    public Directory getDirectory() { return lDirectory; }

    public IndexWriterConfig getConfig() { return iwConfig; }

    public void setDirectory(Directory dir) { lDirectory = dir; }

    public void setConfig(IndexWriterConfig conf) { iwConfig = conf; }

    // ...
}

EDIT:

Having taken a look at my own code (using Lucene.Net), the IndexWriter needs an analyzer, and a MaxFieldLength.

So the super-call would look something like this:

super(new Directory("." + System.getProperty("path.separator")), new StandardAnalyzer(), MaxFieldLength.UNLIMITED);

So adding these values as defaults should fix the issue. Maybe then add getter- and setter-methods for the analyzer and MaxFieldLength, so you have control over that at a later stage.



回答2:

I am not sure how but this syntax works in Spring Batch and ExecutionContext returns a non - null Object in StepExecutionListenerSupport.

public class CustomIndexWriter implements Serializable {


    private static final long serialVersionUID = 1L;

    private transient IndexWriter luceneIndexWriter;

    public CustomIndexWriter(IndexWriter luceneIndexWriter) {
         this.luceneIndexWriter=luceneIndexWriter;
    }

    public IndexWriter getLuceneIndexWriter() {
        return luceneIndexWriter;
    }

    public void setLuceneIndexWriter(IndexWriter luceneIndexWriter) {
        this.luceneIndexWriter = luceneIndexWriter;
    }


}

I put an instance of CustomIndexWriter in step partitioner, partitioned step chunk works with writer by doing, getLuceneIndexWriter() and then in StepExecutionListenerSupport , I close this writer.

This way my spring batch partitioned step works with a single instance of Lucene Index Writer Object.

I was hoping that I will get a NullPointer if trying to perform operation on writer obtained by getLuceneIndexWriter() but that doesn't happen ( despite it being transient ). I am not sure why this works but it does.

For Spring Batch job metadata, I am using in - memory repository and not db based repository. Not sure if this will continue to work once I start using db for metadata.