lucene拼写检查模块

时间 2019-11-13

标签 lucene 拼写检查模块繁體版

原文原文链接

Lucene是Apache发布的开源搜索引擎开发工具包，不只提供了核心的搜索功能，还提供了许多其余功能插件，例如：拼写检查功能模块。apache

搜索拼写检查模块实现类在lucene-suggest-x.xx.x.jar包中，package名为org.apache.lucene.search.spell，其中拼写检查功能的核心实现有3个类，工具

分别为：SpellChecker、DirectSpellChecker、WordBreakSpellChecker;源码分析

3个类提供了不一样的拼写检查方式，区别以下：开发工具

SpellChecker：提供了原始的拼写检查功能，在拼写检查前须要从新创建索引（根据txt字典文件创建索引或者已有索引文件的某个字段创建索引），而后才能够进行拼写检查；网站

SpellChecker源码分析查看以下网站：http://www.tuicool.com/articles/naIBjmui

DirectSpellChecker：提供了改进的拼写检查功能，能够直接利用已有索引文件进行拼写检查，不须要从新创建索引（solr系统默认采用此种方式进行拼写检查）；搜索引擎

WordBreakSpellChecker：也不须要从新建索引，能够利用已有索引进行拼写检查。spa

SpellChecker使用：插件

创建索引有三种方式：code

PlainTextDictionary：用txt文件初始化索引

LuceneDictionary：用现有索引的某一个字段初始化索引

HighFrequencyDictionary：用现有索引的某个字段初始化索引，但每一个索引条目必须知足必定的出现率

 1 //新索引目录
 2 String spellIndexPath = “D:\\newPath”；
 3 //已有索引目录
 4 String oriIndexPath = "D:\\oriPath";
 5 //字典文件
 6 String dicFilePath = “D:\\txt\\dic.txt”；
 7 
 8 //目录
 9 Directory directory = FSDirectory.open((new File(spellIndexPath)).toPath());
10 
11 SpellChecker spellChecker = new SpellChecker(directory);
12 
13 //如下几步用来初始化索引
14 IndexReader reader = DirectoryReader.open(FSDirectory.open((new File(oriIndexPath)).toPath()));
15 //利用已有索引
16 Dictionary dictionary = new LuceneDictionary(reader, fieldName);
17 //或者利用txt字典文件
18 //Dictionary dictionary = new PlainTextDictionary((new File(dicFilePath)).toPath());
19 IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
20 spellChecker.indexDictionary(dictionary, config, true);
21 
22 String queryWord = "beijink";
23 int numSug = 10;
24 //拼写检查
25 String[] suggestions = spellChecker.suggestSimilar(queryWord, numSug);
26 
27 reader.close();
28 spellChecker.close();
29 directory.close();

DirectSpellChecker使用：

1 DirectSpellChecker checker = new DirectSpellChecker();
2 String readerPath = "D:\\path";
3 IndexReader reader = DirectoryReader.open(FSDirectory.open(
4                     (new File(readerPath)).toPath()));
5 Term term = new Term("fieldname", "querytext");
6 int numSug = 10;
7 SuggestWord[] suggestions = checker.suggestSimilar(term, numSug, reader);