I am using solr for data similar to
name:age:sex:balance:nextbalance:interest
I have 30 M records totaling to 4G on disk. I am retrieving by age:23 which is only 50 records. I have indexed="true" in the schema xml. Solr seems to load the entire index on disk into memory (4G). Isnt it supposed to retrieve only the 40 odd records into memory ?
I think it depends on how you configure the cache (what it does and doesn't keep in memory). Loading the entire index into memory can give you huge performance boosts in terms of the time needed to retrieve results, regardless of the query.
Details on configuring cache, and details on performance factors:
- http://wiki.apache.org/solr/SolrCaching
- http://wiki.apache.org/solr/SolrPerformanceFactors
Maybe this is document cache. You need to specify the size of it. Can you please check the following in solrconfig.xml?
<!-- documentCache caches Lucene Document objects (the stored fields for each document).
-->
<documentCache
class="solr.LRUCache"
size="16384"
initialSize="16384"/>
Fields that are stored but not indexed, are saved on disk but not in RAM. However, 100% of the records are indeed indexed in RAM and those indexes contain all of the indexed fields. But inverted indexes are rather efficient for that.
However, when you do queries then SOLR does retrieve the entire set of stored (but not indexed) field contents into RAM for the records which match. This is usually considered to be desirable caching behavior because it means that search results can be transmitted sooner which reduces the overall query turnaround time. As usual with SOLR, you can configure caching behavior in many ways to match your RAM budget and database needs. Have a look at the possibilities in solrconfig.xml.
Note that this is a complex area and you probably will find it difficult to fully understand caching if Google is your main info source. This is an area where it is better to learn from one of the books on SOLR.