初识ElasticSearch

时间 2019-12-06

标签 elasticsearch 栏目日志分析繁體版

原文原文链接

概述

Elasticsearch是一个基于Apache Lucene(TM)的开源搜索引擎。不管在开源仍是专有领域，Lucene能够被认为是迄今为止最早进、性能最好的、功能最全的搜索引擎库。linux

分布式的实时文件存储，每一个字段都被索引并可被搜索
分布式的实时分析搜索引擎
能够扩展到上百台服务器，处理PB级结构化或非结构化数据

下面展现了在关系型数据库中和ElasticSearch中对应的存储字段：git

Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fieldsgithub

这里的索引和传统的关系型数据库中索引有些不一样，在ElasticSearch中，一个索引就像是传统关系型数据库中的数据库。web

在ElasticSearch和Lucene中，使用的是一种叫倒排索引的数据结构加速检索。chrome

安装使用

因为机器有限，因此我采用的是使用虚拟机安装ElasticSearch。数据库

实验环境以下：npm

VMware虚拟机
centos6.5系统
JDK1.8
ElasticSearch 5.2.2

下载与安装

能够在ElasticSearch官网下载最新版的ElasticSearch。json

下载地址：https://www.elastic.co/downloads/elasticsearchvim

若是使用Windows则下载zip包，使用linux则下载tar.gz包。centos

安装以前须要安装JDK1.7以上版本。

在Windows下方法比较简单，解压zip包后，经过DOS命令行切换到bin目录，运行elasticsearch便可，能够经过查看localhost:9200端口，看是否启动成功。

在Linux中安装与使用

在linux中新建一个目录，名字为elasticsearch，使用Xftp或者WinSCP等工具将刚才下载的安装包传输到该目录下。

tar -zxvf elasticsearch-5.2.2.tar.gz解压文件。

进入到解压后的文件，输入.bin/elasticsearch便可打开elasticsearch。

可能出现的问题：

启动失败多是由于使用了root用户，启动不容许使用root用户
启动失败其余缘由，可能linux系统中有些参数须要修改修改，好比最大虚拟内存、用户最大可建立线程数，按照提示进行修改便可，具体设置地点可在网上查找。
linux防火墙的9200端口须要打开，不然在主机上没法访问虚拟机里的ElasticSearch，能够经过ifconfig命令查询Linux主机的ip地址。
若是仍是没法访问，能够进入ElasticSearch目录中的config目录中，vim elasticsearch.yml
，编辑设置文件，将host改成0.0.0.0，保存并重启ElasticSearch，就应该没问题了。

访问 ip + 9200，结果以下，ElasticSearch开启成功。

安装elasticsearch-head

elasticsearch-head是一个很好的管理工具，能够可视化看到集群的情况。

下载地址：https://github.com/mobz/elasticsearch-head

在elasticsearch5.0后，废弃了使用plugin安装的方式，可使用git先clone代码，使用npm安装。

访问 ip + 9100，记得开放端口。

在chrome或者火狐浏览器中可能会出现跨域问题而没法链接，可使用IE链接。

基本使用

在官网能够查看使用多种语言的API，在这里使用最简单的基于HTTP协议的API，使用JSON格式数据。

插入数据

在linux中，输入：
curl -XPUT 'http://192.168.91.129:9200/website/blog/1' -d '{"title":"liuyang", "text":"太阳在头顶之上", "date":"2017-4-16"}'

这样就能够插入一条数据，在插入多条数据后，在浏览器输入：
http://192.168.91.129:9200/_search?pretty

在结果中能够看到有多条记录。pretty的做用是为了让JSON优化排版，也能够直接使用curl -XGET获取数据。

搜索

查询字符串

在浏览器输入：
http://192.168.91.129:9200/_search?q=太阳

在q后面加上本身想要查找的字符串，就能够进行匹配了。

_score字段为匹配程度，越高的结果就会越在前面。

格式化结果后：

{
    "took": 292, 
    "timed_out": false, 
    "_shards": {
        "total": 5, 
        "successful": 5, 
        "failed": 0
    }, 
    "hits": {
        "total": 3, 
        "max_score": 0.56008905, 
        "hits": [
            {
                "_index": "website", 
                "_type": "blog", 
                "_id": "2", 
                "_score": 0.56008905, 
                "_source": {
                    "title": "title2", 
                    "text": "小路弯弯,太阳正在头顶上", 
                    "date": "2017-4-16"
                }
            }, 
            {
                "_index": "website", 
                "_type": "blog", 
                "_id": "3", 
                "_score": 0.5446649, 
                "_source": {
                    "title": "title3", 
                    "text": "最美的太阳", 
                    "date": "2017-4-16"
                }
            }, 
            {
                "_index": "website", 
                "_type": "blog", 
                "_id": "1", 
                "_score": 0.48515025, 
                "_source": {
                    "title": "liuyang", 
                    "text": "太阳在头顶之上", 
                    "date": "2017-4-16"
                }
            }
        ]
    }
}

全文搜索
在linux中输入：
curl -XGET 'http://192.168.91.129:9200/_search' -d '{"query":{"match":{"text":"太阳"}}}'

后面的JSON为搜索的具体field，这样就能够作到在关系型数据库中很难作到的全文检索。

从输出结果中能够看到，每个document都有对应的_score字段，这就是对此document的相关性评分。

格式化输出：

{
    "took": 44, 
    "timed_out": false, 
    "_shards": {
        "total": 5, 
        "successful": 5, 
        "failed": 0
    }, 
    "hits": {
        "total": 3, 
        "max_score": 0.5716521, 
        "hits": [
            {
                "_index": "website", 
                "_type": "blog", 
                "_id": "1", 
                "_score": 0.5716521, 
                "_source": {
                    "title": "liuyang", 
                    "text": "太阳在头顶之上", 
                    "date": "2017-4-16"
                }
            }, 
            {
                "_index": "website", 
                "_type": "blog", 
                "_id": "3", 
                "_score": 0.5649868, 
                "_source": {
                    "title": "title3", 
                    "text": "最美的太阳", 
                    "date": "2017-4-16"
                }
            }, 
            {
                "_index": "website", 
                "_type": "blog", 
                "_id": "2", 
                "_score": 0.48515025, 
                "_source": {
                    "title": "title2", 
                    "text": "小路弯弯,太阳正在头顶上", 
                    "date": "2017-4-16"
                }
            }
        ]
    }
}

经过对比两种搜索方式的结果可知，使用全文搜索的匹配更加知足咱们的预期。