docvalues和Fieldcache

Fieldcache:

 docID->document->fieldvaluehtml

不管是聚类排序关联等,首先都须要得到文档中某个字段的值,经过docID去得到整个document,而后再去得到字段值,term转换获得最终值,FieldCache一开始就缓存了全部文档某个特定域(全部数值类型以及不分词的stringField)的值到内存,便于随机存取该域值apache

Fieldcache实现过程:缓存

http://moshalanye.iteye.com/blog/281379数据结构

  缺点:less

1. 常驻内存,大小是全部文档个数特定域类型大小elasticsearch

2. 初始加载过程耗时,须要遍历倒排索引及类型转换ide

 

Docvalues:

docID->fieldvalueui

  建索引时,创建了document到field value的面向列的正排索引数据结构,直接经过已知的docID定位到字段值,从而无需加载document,亦不须要term转换,遍历term找寻doc等的过程spa

   优势:大约节省三分之一的内存!htm

   缺点:因为是硬盘读取,而非内存模式,对于大批量的使用下,优点明显,速度更优小量状况下没有内存快!整体会慢15-20%

 

20 February 2015 - Apache Lucene 5.0.0 and Apache Solr 5.0.0 Available

http://lucene.apache.org/ 

FieldCache is gone (moved to a dedicated UninvertingReader in the misc module). This means when you intend to sort on a field, you should index that field using doc values, which is much faster and less heap consuming than FieldCache.

LUCENE-5666Change uninverted access (sorting, faceting, grouping, etc) to use the DocValues API instead of FieldCache

 

Es中

https://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html 

Sorl中

http://wiki.apache.org/solr/DocValues?cm_mc_uid=56088888487714180880058&cm_mc_sid_50200000=1448507379 

https://cwiki.apache.org/confluence/display/solr/DocValues

相关文章
相关标签/搜索