使用lucene,咱们经过搜索出来的信息,都是相关性最强的排在前面的,这里涉及到评分机制,在实际生产中一定是要根据具体的业务需求作出更为复杂的自定义评分机制,但这里先简单看看lucene的评分是如何设定的。java
private Map<String,Float> scores = new HashMap<String,Float>(); //构造函数 public IndexUtil() { try { setDates(); //设置Score相对高的信息 scores.put("itat.org",2.0f); scores.put("zttc.edu", 1.5f); directory = FSDirectory.open(new File("d:/lucene/index02")); } catch (IOException e) { e.printStackTrace(); } } //建立索引 public void index() { IndexWriter writer = null; try { writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35))); writer.deleteAll(); Document doc = null; for(int i=0;i<ids.length;i++) { doc = new Document(); doc.add(new Field("id",ids[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("email",emails[i],Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field("email","test"+i+"@test.com",Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field("content",contents[i],Field.Store.NO,Field.Index.ANALYZED)); doc.add(new Field("name",names[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS)); //存储数字 doc.add(new NumericField("attach",Field.Store.YES,true).setIntValue(attachs[i])); //存储日期 doc.add(new NumericField("date",Field.Store.YES,true).setLongValue(dates[i].getTime())); //截取@后面的字段 String et = emails[i].substring(emails[i].lastIndexOf("@")+1); //设置评分 if(scores.containsKey(et)) { doc.setBoost(scores.get(et)); } else { doc.setBoost(0.5f); } writer.addDocument(doc); } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (LockObtainFailedException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { try { if(writer!=null)writer.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } }
测试结果函数
进行评分前,它是长这样的:学习
进行评分后,它是长这样的:测试
注:这里面是的doc进行setBoost,可是以前看一篇博文,Lucene4.x以后好像不能对doc进行评分了,只能对Field进行评分。在学习《Lucene 实战》一书时,发现能够对Query进行加大评分,特别是使用像BooleanQuery这种包含多个Query的查询器。spa