可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm really puzzled why it keeps dying with java.lang.OutOfMemoryError during indexing even though it has a few GBs of memory.

Is there a fundamental reason why it needs manual tweaking of config files / jvm parameters instead of it just figuring out how much memory is available and limiting itself to that? No other programs except Solr ever have this kind of problem.

Yes, I can keep tweaking JVM heap size every time such crashes happen, but this is all so backwards.

Here's stack trace of the latest such crash in case it is relevant:

SEVERE: java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.<init>(String.java:216)
    at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
    at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
    at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
    at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
    at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
    at org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)
    at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245)
    at org.apache.lucene.search.Searcher.search(Searcher.java:171)
    at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
    at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
    at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
    at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
    at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
    at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
    at java.lang.Thread.run(Thread.java:619)

回答1:

Looking at the stack trace, it looks like you are performing a search, and sorting by a field. If you need to sort by a field, internally Lucene needs to load up all the values of all the terms in the field into memory. If the field contains a lot of data, then it is very possible that you may run out of memory.

回答2:

I'm not certain there is a steadfast way to ensure you won't run into OutOfMemoryExceptions with Lucene. The problem you are facing is problem related to the use of FieldCache. From the Lucene API "Maintains caches of term values.". If your terms exceed the amount of memory allocated to the JVM you'll get the exception.

The documents are being sorted "at org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)", which will take up as much memory as is needed to store the terms being sorted for the index.

You'll need to review projected size of the fields that are sortable and adjust the JVM settings accordingly.

回答3:

a wild guess, the documents you are indexing are very large

Lucene by default only indexes the first 10,000 terms of a document to avoid OutOfMemory errors, you can overcome this limit see setMaxFieldLength

Also, you could call optimize() and close as soon as you are done with processing with Indexwriter()

a definite way is to profile and find the bottleneck =]

回答4:

You are using the post.jar to index data? This jar has a bug in solr1.2/1.3 I think (but I don't know the details). Our company has fixed this internally and it should be also fixed in the latest trunk solr1.4/1.5.

回答5:

I was using this Java:

$ java -version
java version "1.6.0"
OpenJDK  Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

Which was running out of heap space, but then I upgraded to this Java:

$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

And now it works fine, on a huge dataset, with lots of term facets.

回答6:

For me it worked after restarting the Tomcat server.

回答7:

navigate to C:\Bitnami\solr-4.7.2-0\apache-solr\scripts
open up serviceinstall.bat (with notepad++ or another program)
Either add or update the following properties:- ++JvmOptions=-Xms1024M ++JvmOptions=-Xmx1024M
- from the command prompt in that window, run serviceinstall.bat REMOVE
- then run serviceinstall.bat INSTALL
- Hope that helpw!

回答8:

An old question but since I stumbled upon it:

The String Field Cache is lot more compact from Lucene 4.0. So lot can fit in.
Field Cache is an in-memory structure. So can't prevent OOME.
For fields which need sorting or faceting - one should try DocValues to overcome this problem. DocValues do work with numeric and non-analyzed string values. And I presume many use cases of sorting/faceting will have one of these value types.