Lucene 之 Facet

时间 2019-11-18

标签 lucene facet 繁體版

原文原文链接

说到Facet，我还真找不到一个合适的中文词汇来描述它，英文翻译是方面，感受不够贴切，你们也没必要纠结它的中文叫法是啥，你只须要知道使用Facet能解决什么类型的问题就好了，来看几个典型的应用案例：java

看了上面几张图，你们应该知道Facet是用来干吗的了，若是非要用语言描述Facet的用途，那Facet的用途就是根据域的域值进行分组统计，注意这里的域必须是FacetField,你Facet域的域值有几个就会分几组，并统计在Query查询条件下各组的命中结果数量。但一般不须要显示全部分组，就如图上面3张图，通常都是显示Top N 个分组便可。是否是以为Facet和Group有点类似，对，看起来是有那么一点类似，那二者到底有什么区别呢？apache

Html代码数组

They are two different lucene features: app
Grouping was first released with Lucene 3.2, its related jira issue is LUCENE-1421: it allows to group search results by specified field. For example, if you group by the author field, then all documents with the same value in the author field fall into a single group. You will have a kind of tree as output. If you want to go deeper into using this lucene feature, this blog post should be useful. ide
Faceting was first released with Lucene 3.4, its related jira issue is LUCENE-3079: this feature doesn't group documents, it just tells you how many documents fall in a specific value of a facet. For example, if you have a facet based on the author field, you will receive a list of all your authors, and for each author you will know how many documents belong to that specific author. After, if you want to see those documents, you have to query one more time adding a specific filter (author=whatever). The faceted search is in fact based on browsing documents applying multiple filters to progressively reach the documents you're really interested in. 函数

对不起，只有英文的说明，大意就是：Grouping分组功能是在跟随Lucene 3.2稳定版首次发布的，它容许你根据一个指定的域进行分组，举个例子，若是你根据一个author域进行分组，那么这个域的全部域值相同的索引文档进行落入到这个分组中。Facet是在跟随Lucene3.4稳定版首次发布的，facet并不对文档进行分组，Facet只是告诉你某个Facet下每一个域值的命中数量，举个例子，若是你有个facet是基于author域的，那么facet会返回author域下的每一个域值，以及每一个author域值下的命中结果总数。若是你想查看每一个author域值下的命中结果，那么你可能须要再发起一次请求，经过添加一个filter如author=xxxx. 其实Facet搜索就是经过应用多个filter来让用户浏览索引文档，使用户逐步找到本身感兴趣的索引文档，一句话：Facet分组统计的目的是经过统计的数量诱发你点击的欲望，通常你看到数量多的，你会有点击欲望，点击进去了你本身会判断是否是你感兴趣的内容，若是不是，那么你会点击数据量次之的，如此下去，逐步诱导你找到你感兴趣的内容，这就是Facet功能设计的目的。说白了就是利用羊群效应诱发你去点击。post

首先来你须要建立FacetField域，在建立以前你须要了解FacetField的是否分词，存储，位置信息等。看看FacetField源码一切就知晓了。测试

FacetField的域名称都是dummy，域类型都是默认的DOCS_AND_FREQS_AND_POSITIONS即须要记录Term频率和Document频率(即项向量)和位置信息。而 FieldType对于默认是Stored=false,而tokenized=true(即会进行分词处理转化为多个Term),了解这些颇有必要。

而后FacetField跟普通的Field同样，须要添加到document中，而后document须要经过IndexWriter对象a调用addDocument写入索引，但此时document须要作一个转换过程，即ui

Java代码 this

FacetsConfig.build(DirectoryTaxonomyWriter writer,Document document);

咱们来看看FacetsConfig的build方法背地里都干了些什么？

首先定义了3个Map分别对应了3种类型的FacetField:FacetField,SortedSetDocValuesFacetField,AssociationFacetField, FacetField就是普通的Facet域，SortedSetDocValuesFacetField就是能够用来排序的DocValuesField域，AssociationFacetField是用来自定义Facets的域，它能够关联任意的byte[]字节数组.把用户添加的域用3个map分开后，分别用了3个函数进行处理，如图：

processFacetFields内部关键点代码就是：

pathToString就是把多个域值拼在一块儿，好比:

Java代码

new FacetField("Author", new String[] { "Bob" ,"Jack","Tom"})

那拼一块儿后就是BobJackTom,而后建立了一个StringField且Store.NO,意思就是咱们add一个FacetField其实就是add了一个StringField,固然二者不能彻底等同。注意是if里的条件：

ft.multiValued && (ft.hierarchical || ft.requireDimCount)即若是是多值域且(path有多个值或者须要统计facet总数)，若是不是多值域，则会add一个BinaryDocValuesField域：

Java代码

doc.add(new BinaryDocValuesField(indexFieldName, dedupAndEncode(ordinals.get())));

而后咱们经过IndexSearcher查询的时候须要传入FacetsCollector结果收集器，剩下的套路基本都是固定的，没什么好说的，以下：

Java代码

FacetsCollector fc = new FacetsCollector();
searcher.search(new MatchAllDocsQuery(), null, fc);
List<FacetResult> results = new ArrayList<FacetResult>();
Facets facets = new FastTaxonomyFacetCounts(taxoReader, this.config, fc);
results.add(facets.getTopChildren(10, "Author"));
results.add(facets.getTopChildren(10, "Publish Date"));
indexReader.close();
taxoReader.close();

至于DrillDownQuery，他其实就是根据用户传入的path数组用BooleanQuery进行连接的：

先用BooleanQuery把多个TermQuery用Or连接起来，再用ConstantScoreQuery包装下，主要是为了禁用查询权重的。

至于DrillSideways更不须要被它的外表迷惑了，其实他内部其实仍是根据传入的IndexSearch和Facet结果收集器去查询的：

内部就是为了包装获得一个DrillSidewaysQuery对象，最后仍是调用的IndexSearcher的search方法。

下面是一个Facet使用简单示例：

Java代码

package com.yida.framework.lucene5.facet;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.facet.DrillDownQuery;
import org.apache.lucene.facet.DrillSideways;
import org.apache.lucene.facet.FacetField;
import org.apache.lucene.facet.FacetResult;
import org.apache.lucene.facet.Facets;
import org.apache.lucene.facet.FacetsCollector;
import org.apache.lucene.facet.FacetsConfig;
import org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts;
import org.apache.lucene.facet.taxonomy.TaxonomyReader;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader;
import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
/**
* Facet简单示例
*
* @author Lanxiaowei
*
*/
public class SimpleFacetsExample {
private final Directory indexDir = new RAMDirectory();
private final Directory taxoDir = new RAMDirectory();
private final FacetsConfig config = new FacetsConfig();
public SimpleFacetsExample() {
this.config.setHierarchical("Author", true);
this.config.setHierarchical("Publish Date", true);
}
/**
* 建立测试索引
*
* @throws IOException
*/
private void index() throws IOException {
IndexWriter indexWriter = new IndexWriter(this.indexDir,
new IndexWriterConfig(new WhitespaceAnalyzer())
.setOpenMode(IndexWriterConfig.OpenMode.CREATE));
DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(
this.taxoDir);
Document doc = new Document();
doc.add(new FacetField("Author", new String[] { "Bob" }));
doc.add(new FacetField("Publish Date", new String[] { "2010", "10",
"15" }));
indexWriter.addDocument(this.config.build(taxoWriter, doc));
doc = new Document();
doc.add(new FacetField("Author", new String[] { "Lisa" }));
doc.add(new FacetField("Publish Date", new String[] { "2010", "10",
"20" }));
indexWriter.addDocument(this.config.build(taxoWriter, doc));
doc = new Document();
doc.add(new FacetField("Author", new String[] { "Lisa" }));
doc.add(new FacetField("Publish Date",
new String[] { "2012", "1", "1" }));
indexWriter.addDocument(this.config.build(taxoWriter, doc));
doc = new Document();
doc.add(new FacetField("Author", new String[] { "Susan" }));
doc.add(new FacetField("Publish Date",
new String[] { "2012", "1", "7" }));
indexWriter.addDocument(this.config.build(taxoWriter, doc));
doc = new Document();
doc.add(new FacetField("Author", new String[] { "Frank" }));
doc.add(new FacetField("Publish Date",
new String[] { "1999", "5", "5" }));
indexWriter.addDocument(this.config.build(taxoWriter, doc));
indexWriter.close();
taxoWriter.close();
}
private List<FacetResult> facetsWithSearch() throws IOException {
DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(this.taxoDir);
FacetsCollector fc = new FacetsCollector();
FacetsCollector.search(searcher, new MatchAllDocsQuery(), 10, fc);
List<FacetResult> results = new ArrayList<FacetResult>();
Facets facets = new FastTaxonomyFacetCounts(taxoReader, this.config, fc);
results.add(facets.getTopChildren(10, "Author", new String[0]));
results.add(facets.getTopChildren(10, "Publish Date", new String[0]));
indexReader.close();
taxoReader.close();
return results;
}
private List<FacetResult> facetsOnly() throws IOException {
DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(this.taxoDir);
FacetsCollector fc = new FacetsCollector();
searcher.search(new MatchAllDocsQuery(), null, fc);
List<FacetResult> results = new ArrayList<FacetResult>();
Facets facets = new FastTaxonomyFacetCounts(taxoReader, this.config, fc);
results.add(facets.getTopChildren(10, "Author"));
results.add(facets.getTopChildren(10, "Publish Date"));
indexReader.close();
taxoReader.close();
return results;
}
private FacetResult drillDown() throws IOException {
DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(this.taxoDir);
DrillDownQuery q = new DrillDownQuery(this.config);
q.add("Publish Date", new String[] { "2010" });
FacetsCollector fc = new FacetsCollector();
FacetsCollector.search(searcher, q, 10, fc);
Facets facets = new FastTaxonomyFacetCounts(taxoReader, this.config, fc);
FacetResult result = facets.getTopChildren(10, "Author", new String[0]);
indexReader.close();
taxoReader.close();
return result;
}
private List<FacetResult> drillSideways() throws IOException {
DirectoryReader indexReader = DirectoryReader.open(this.indexDir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(this.taxoDir);
DrillDownQuery q = new DrillDownQuery(this.config);
q.add("Publish Date", new String[] { "2010" });
DrillSideways ds = new DrillSideways(searcher, this.config, taxoReader);
DrillSideways.DrillSidewaysResult result = ds.search(q, 10);
List<FacetResult> facets = result.facets.getAllDims(10);
indexReader.close();
taxoReader.close();
return facets;
}
public List<FacetResult> runFacetOnly() throws IOException {
index();
return facetsOnly();
}
public List<FacetResult> runSearch() throws IOException {
index();
return facetsWithSearch();
}
public FacetResult runDrillDown() throws IOException {
index();
return drillDown();
}
public List<FacetResult> runDrillSideways() throws IOException {
index();
return drillSideways();
}
public static void main(String[] args) throws Exception {
// one
System.out.println("Facet counting example:");
System.out.println("-----------------------");
SimpleFacetsExample example = new SimpleFacetsExample();
List<FacetResult> results1 = example.runFacetOnly();
System.out.println("Author: " + results1.get(0));
System.out.println("Publish Date: " + results1.get(1));
// two
System.out.println("Facet counting example (combined facets and search):");
System.out.println("-----------------------");
List<FacetResult> results = example.runSearch();
System.out.println("Author: " + results.get(0));
System.out.println("Publish Date: " + results.get(1));
// three
System.out.println("Facet drill-down example (Publish Date/2010):");
System.out.println("---------------------------------------------");
System.out.println("Author: " + example.runDrillDown());
// four
System.out.println("Facet drill-sideways example (Publish Date/2010):");
System.out.println("---------------------------------------------");
for (FacetResult result : example.runDrillSideways()) {
System.out.println(result);
}
}
}