I am using Crawler controller to crawl all pages of a medium website. It randomly crawls 2-3 pages and then it causes a lock on the IndexWriter
Directory dir = FSDirectory.open(new File(index));
IndexWriterConfig conf = new IndexWriterConfig(org.apache.lucene.util.Version.LUCENE_41,new StandardAnalyzer(org.apache.lucene.util.Version.LUCENE_41));
writer = new IndexWriter(dir, conf); // line which throws lock exception.
Logs:
From: SiteSearch.KCCrawlerController.(80): Lock obtain timed out: NativeFSLock@D:\Websites\ccc\WEB-INF\lucene-index\en\write.lock: 05/08/2014 10:57:55 org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@D:\Websites\ccc\WEB-INF\lucene-index\en\write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:636) at SiteSearch.KCCrawlerController.(KCCrawlerController.java:80) at org.apache.jsp.monitors.siteSearchIndexer_jsp._jspService(siteSearchIndexer_jsp.java:66) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:386) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at com.tridion.ambientdata.web.AmbientDataServletFilter.doFilter(AmbientDataServletFilter.java:255) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at adminV3.ugc.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.ajp.AjpAprProcessor.process(AjpAprProcessor.java:429) at org.apache.coyote.ajp.AjpAprProtocol$AjpConnectionHandler.process(AjpAprProtocol.java:384) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1665) at java.lang.Thread.run(Unknown Source)
Adding jsp: http://example.com/en/consulting/diagnostics.jsp?crawler=yes
From: SiteSearch.KCCrawler.visit(95): Stream closed: 05/08/2014 10:57:55 java.io.IOException: Stream closed at org.apache.jasper.runtime.JspWriterImpl.ensureOpen(JspWriterImpl.java:204) at org.apache.jasper.runtime.JspWriterImpl.write(JspWriterImpl.java:312) at org.apache.jasper.runtime.JspWriterImpl.write(JspWriterImpl.java:342) at SiteSearch.KCCrawler.visit(KCCrawler.java:95) at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:306) at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:189) at java.lang.Thread.run(Unknown Source)
Why am I getting this exception? Any help.
UPDATE: 17/08/2014 :
When I run the Indexer first time, it completes successfully with the below exception thrown. If I run the search on this, I get my results successfully. However if I run the Indexer again, it throws the lock exception mentioned above. It also shows that my controller class is called twice.
org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.io.IOException: Stream closed at org.apache.jasper.runtime.JspWriterImpl.ensureOpen(JspWriterImpl.java:204) at org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:115) at org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:188) at org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:118) at org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:77) at org.apache.jsp.monitors.siteSearchIndexer_jsp._jspService(siteSearchIndexer_jsp.java:82) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:386)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
Quoting the Javadocs -
"Opening an IndexWriter creates a lock file for the directory in use. Trying to open another IndexWriter on the same directory will lead to a LockObtainFailedException. The LockObtainFailedException is also thrown if an IndexReader on the same directory is used to delete documents from the index."
"IndexWriter instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexWriter instance as this may cause deadlock; use your own (non-Lucene) objects instead."
https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/IndexWriter.html
Are you creating new instances of IndexWriter for each page that you are crawling?