How to make sure Solr/Lucene won't die with ja

2019-03-12 17:08发布

I'm really puzzled why it keeps dying with java.lang.OutOfMemoryError during indexing even though it has a few GBs of memory.

Is there a fundamental reason why it needs manual tweaking of config files / jvm parameters instead of it just figuring out how much memory is available and limiting itself to that? No other programs except Solr ever have this kind of problem.

Yes, I can keep tweaking JVM heap size every time such crashes happen, but this is all so backwards.

Here's stack trace of the latest such crash in case it is relevant:

SEVERE: java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.<init>(String.java:216)
    at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
    at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
    at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
    at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
    at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
    at org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)
    at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245)
    at org.apache.lucene.search.Searcher.search(Searcher.java:171)
    at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
    at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
    at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
    at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
    at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
    at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
    at java.lang.Thread.run(Thread.java:619)

标签: lucene jvm solr
8条回答
ら.Afraid
2楼-- · 2019-03-12 17:23

I was using this Java:

$ java -version
java version "1.6.0"
OpenJDK  Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

Which was running out of heap space, but then I upgraded to this Java:

$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

And now it works fine, on a huge dataset, with lots of term facets.

查看更多
Root(大扎)
3楼-- · 2019-03-12 17:25

a wild guess, the documents you are indexing are very large

Lucene by default only indexes the first 10,000 terms of a document to avoid OutOfMemory errors, you can overcome this limit see setMaxFieldLength

Also, you could call optimize() and close as soon as you are done with processing with Indexwriter()

a definite way is to profile and find the bottleneck =]

查看更多
干净又极端
4楼-- · 2019-03-12 17:28

You are using the post.jar to index data? This jar has a bug in solr1.2/1.3 I think (but I don't know the details). Our company has fixed this internally and it should be also fixed in the latest trunk solr1.4/1.5.

查看更多
时光不老,我们不散
5楼-- · 2019-03-12 17:28

An old question but since I stumbled upon it:

  1. The String Field Cache is lot more compact from Lucene 4.0. So lot can fit in.
  2. Field Cache is an in-memory structure. So can't prevent OOME.
  3. For fields which need sorting or faceting - one should try DocValues to overcome this problem. DocValues do work with numeric and non-analyzed string values. And I presume many use cases of sorting/faceting will have one of these value types.
查看更多
神经病院院长
6楼-- · 2019-03-12 17:32
  • navigate to C:\Bitnami\solr-4.7.2-0\apache-solr\scripts
  • open up serviceinstall.bat (with notepad++ or another program)
  • Either add or update the following properties:- ++JvmOptions=-Xms1024M ++JvmOptions=-Xmx1024M
    • from the command prompt in that window, run serviceinstall.bat REMOVE
    • then run serviceinstall.bat INSTALL
    • Hope that helpw!
查看更多
beautiful°
7楼-- · 2019-03-12 17:36

Looking at the stack trace, it looks like you are performing a search, and sorting by a field. If you need to sort by a field, internally Lucene needs to load up all the values of all the terms in the field into memory. If the field contains a lot of data, then it is very possible that you may run out of memory.

查看更多
登录 后发表回答