lucene基础（1）

时间 2019-11-11

标签 lucene 基础繁體版

原文原文链接

lucene使用步骤分为两步，一是创建索引，二是搜索索引文档；html

1、创建索引java

先了解必须的五个基础类；mysql

一、Document:由过个Field组成。至关于数据库的一条记录，Field对象至关于记录的字段；sql

二、Field:用来描述文档的某个属性；数据库

选项	描述
Field.Store.Yes	用于存储字段值。适用于显示搜索结果的字段 — 例如，文件路径和 URL。
Field.Store.No	没有存储字段值 — 例如，电子邮件消息正文。
Field.Index.No	适用于未搜索的字段 — 仅用于存储字段，好比文件路径。
Field.Index.ANALYZED	用于字段索引和分析 — 例如，电子邮件消息正文和标题。
Field.Index.NOT_ANALYZED	用于编制索引但不分析的字段。它在总体中保留字段的原值 — 例如，日期和我的名称。

三、Analyzer:在被索引前，文档须要对内容进行分词处理。分词后交由IndexWriter创建索引；apache

四、IndexWriter:做用是把Document加到索引中；spa

五、Directory:表明索引存储的位置；code

如下是基于lucene 6.2的demohtm

/**
     * 对文件创建索引
     * @throws IOException
     */
    @Test
    public void createFileIndexTest() throws IOException {
        File fileDir = new File("E:\\doc\\mysql_");
        //File indexDir = new File("E:\\luceneIndex"); //索引文件路径
        Directory indexDir = FSDirectory.open(Paths.get("E:\\luceneIndex"));//lucene6.0
        Analyzer luceneAnalyzer = new StandardAnalyzer();
        File[] dataFiles = fileDir.listFiles();
        IndexWriter indexWriter = new IndexWriter(indexDir, indexWriterConfig);
        long start = System.currentTimeMillis();
        for(int i =0 ; i < dataFiles.length; i++) {
            if(dataFiles[i].isFile() && dataFiles[i].getName().endsWith(".txt")) {
                System.out.println("Indexing file " + dataFiles[i].getCanonicalPath());
                Document document = new Document();
                Reader reader = new FileReader(dataFiles[i]);
                document.add(new StringField("path", dataFiles[i].getCanonicalPath(),Field.Store.YES));
                document.add(new TextField("content", reader));

                indexWriter.addDocument(document);
            }
        }
        indexWriter.close();
        System.out.println("It takes " + (System.currentTimeMillis() - start)
                + " milliseconds to create index for the files in directory "
                + fileDir.getPath());
    }

2、搜索文档对象

先了解5个基础类

一、Query:有多种实现，目标是把用户输入封装成lucene能识别的query;

二、Term:是搜索的基本单位；如: new Term("field","queryStr");第一个参数是表明哪一个Field,第二个参数是查询关键字；

三、TermQuery:Query的具体实现；

四、IndexSearcher:用来在文档上搜索，只读方式；

五、Hits:用来保存搜索的结果(6.0以上是TopDocs);

如下是基于lucene6.2的demo

/**
     * 搜索文件内容
     * @throws IOException
     */
    @Test
    public void searchFileIndexTest() throws IOException {
        String queryStr = "select";
        Directory indexDir = FSDirectory.open(Paths.get("E:\\luceneIndex"));
        IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(indexDir));//lucene 6.0
        Term term = new Term("content",queryStr.toLowerCase());
        TermQuery luceneQuery = new TermQuery(term);
        TopDocs docs = searcher.search(luceneQuery,10); //>lucene 6.0
        ScoreDoc[] scoreDocs = docs.scoreDocs;
        for (ScoreDoc scoreDoc : scoreDocs) {
            System.out.println(searcher.doc(scoreDoc.doc));
        }
    }

参考：

https://www.ibm.com/developerworks/cn/java/j-lo-lucene1/

http://codepub.cn/2016/05/20/Lucene-6-0-in-action-2-All-kinds-of-Field-and-sort-operations/

http://www.ibm.com/developerworks/cn/opensource/os-apache-lucenesearch/index.html