How to make sure Solr/Lucene won't die with ja

I'm really puzzled why it keeps dying with java.lang.OutOfMemoryError during indexing even though it has a few GBs of memory.

Is there a fundamental reason why it needs manual tweaking of config files / jvm parameters instead of it just figuring out how much memory is available and limiting itself to that? No other programs except Solr ever have this kind of problem.

Yes, I can keep tweaking JVM heap size every time such crashes happen, but this is all so backwards.

Here's stack trace of the latest such crash in case it is relevant:

SEVERE: java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.<init>(String.java:216)
    at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
    at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
    at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
    at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
    at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
    at org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)
    at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245)
    at org.apache.lucene.search.Searcher.search(Searcher.java:171)
    at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
    at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
    at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
    at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
    at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
    at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
    at java.lang.Thread.run(Thread.java:619)

标签： lucene jvm solr

8条回答

ら.Afraid

2楼-- · 2019-03-12 17:23

I was using this Java:

$ java -version
java version "1.6.0"
OpenJDK  Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

Which was running out of heap space, but then I upgraded to this Java:

$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

And now it works fine, on a huge dataset, with lots of term facets.

0人赞添加讨论(0) 举报

Root（大扎）

3楼-- · 2019-03-12 17:25

a wild guess, the documents you are indexing are very large

Lucene by default only indexes the first 10,000 terms of a document to avoid OutOfMemory errors, you can overcome this limit see setMaxFieldLength

Also, you could call optimize() and close as soon as you are done with processing with Indexwriter()

a definite way is to profile and find the bottleneck =]

0人赞添加讨论(0) 举报

干净又极端

4楼-- · 2019-03-12 17:28

You are using the post.jar to index data? This jar has a bug in solr1.2/1.3 I think (but I don't know the details). Our company has fixed this internally and it should be also fixed in the latest trunk solr1.4/1.5.

0人赞添加讨论(0) 举报

时光不老，我们不散

5楼-- · 2019-03-12 17:28

An old question but since I stumbled upon it:

The String Field Cache is lot more compact from Lucene 4.0. So lot can fit in.
Field Cache is an in-memory structure. So can't prevent OOME.
For fields which need sorting or faceting - one should try DocValues to overcome this problem. DocValues do work with numeric and non-analyzed string values. And I presume many use cases of sorting/faceting will have one of these value types.

0人赞添加讨论(0) 举报

神经病院院长

6楼-- · 2019-03-12 17:32

navigate to C:\Bitnami\solr-4.7.2-0\apache-solr\scripts
open up serviceinstall.bat (with notepad++ or another program)
Either add or update the following properties:- ++JvmOptions=-Xms1024M ++JvmOptions=-Xmx1024M
- from the command prompt in that window, run serviceinstall.bat REMOVE
- then run serviceinstall.bat INSTALL
- Hope that helpw!

0人赞添加讨论(0) 举报

beautiful°

7楼-- · 2019-03-12 17:36

Looking at the stack trace, it looks like you are performing a search, and sorting by a field. If you need to sort by a field, internally Lucene needs to load up all the values of all the terms in the field into memory. If the field contains a lot of data, then it is very possible that you may run out of memory.

0人赞添加讨论(0) 举报

1 2 下一页

How to make sure Solr/Lucene won't die with ja

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间