[Elasticsearch]3.信息输出: 搜索分析

信息输出: 搜索分析

连载中...html

Information out: search and analyze 算法

之因此可以使用Elasticsearch存储检索文档数据和它们的元数据还要感谢底层的搜索引擎Lucene.sql

While you can use Elasticsearch as a document store and retrieve documents and their metadata, the real power comes from being able to easily access the full suite of search capabilities built on the Apache Lucene search engine library.数据结构

Elasticsearch基于Lucene又提供了简单易用的REST API用于管理集群和对数据进行索引搜索处理.简单到你能够直接经过命令行也能够经过Kibana提供的开发者控制台发起请求操做Elasticsearch.在应用中(你编写的程序中)你可使用Elasticsearch客户端操做Elasticsearch,目前Elasticsearch不但提供了Java、JavaScript、Go、.Net、PHP语言的客户端还提供了使用Perl、Python、Ruby编写的客户端.app

Elasticsearch provides a simple, coherent REST API for managing your cluster and indexing and searching your data. For testing purposes, you can easily submit requests directly from the command line or through the Developer Console in Kibana. From your applications, you can use the Elasticsearch client for your language of choice: Java, JavaScript, Go, .NET, PHP, Perl, Python or Ruby.机器学习

搜索

Searching your dataelasticsearch

可使用Elasticsearch提供的REST API进行结构化搜索、全文检索和组合搜索(把俩个搜索组合到一块儿).结构化搜索有点相似于使用SQL构建的搜索.好比搜索hire_date为特定值的employeegenderage字段. 全文检索是按文档跟搜索文档的相关程度返回搜索结果,越匹配搜索文本的文档越在最前面.那怎么定义越匹配呢? 这就要提到打分机制了.后面文档有介绍,这里就不展开了.ide

The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two. Structured queries are similar to the types of queries you can construct in SQL. For example, you could search the gender and age fields in your employee index and sort the matches by the hire_date field. Full-text queries find all documents that match the query string and return them sorted by relevance—how good a match they are for your search terms.学习

Elasticsearch除了支持单个词的查询,还支持语句查询、类似查询、前置匹配查询还支持提供自动补全建议.就问你功能强大不强大?优化

In addition to searching for individual terms, you can perform phrase searches, similarity searches, and prefix searches, and get autocomplete suggestions.

须要搜索地理位置信息和其它数字类型的数据?就像上一篇介绍的Elasticsearch对这种特定类型的数据是使用了优化过的特定数据结构存储的而不是直接存储个文本了事,这也是它搜索快的缘由.

Have geospatial or other numerical data that you want to search? Elasticsearch indexes non-textual data in optimized data structures that support high-performance geo and numerical queries.

你可使用Elasticsearch提供的功能强大的JSON风格的查询语言搜索数据也能够采用相似SQL的查询对数据进行搜索统计.Elasticsearch提供的JDBC和ODBC驱动能够很方便跟第三方应用使用SQL交互.

You can access all of these search capabilities using Elasticsearch’s comprehensive JSON-style query language (Query DSL). You can also construct SQL-style queries to search and aggregate data natively inside Elasticsearch, and JDBC and ODBC drivers enable a broad range of third-party applications to interact with Elasticsearch via SQL.

分析

Analyzing your data

Elasticsearch提供的聚合功能可让咱们构建一些比较复杂统计查询从而能够发现数据中的一些关键指标、规律模式趋势。而不仅是"大海捞针".使用聚合还能够解答这样的问题:

Elasticsearch aggregations enable you to build complex summaries of your data and gain insight into key metrics, patterns, and trends. Instead of just finding the proverbial “needle in a haystack”, aggregations enable you to answer questions like:

  • 大海里究竟有多少针?

  • > How many needles are in the haystack?

  • 这些针平均多长?

  • > What is the average length of the needles?

  • 每一个制造商制造的针的平均长度是多少?

  • > What is the median length of the needles, broken down by manufacturer?

  • 每六个月大海中新增多少针?

  • > How many needles were added to the haystack in each of the last six months?

还可使用聚合解答更难点的问题:

You can also use aggregations to answer more subtle questions, such as:

  • 你最喜欢哪一个针制造商?

  • > What are your most popular needle manufacturers?

  • 是否有不合格的针(批次)

  • > Are there any unusual or anomalous clumps of needles?

执行聚合操做和搜索操做使用的是相同的数据结构,因此聚合操做像搜索操做同样快.所以咱们能够近实时的对数据进行分析和可视化.报表和看板能够显示最近的信息.

Because aggregations leverage the same data-structures used for search, they are also very fast. This enables you to analyze and visualize your data in real time. Your reports and dashboards update as your data changes so you can take action based on the latest information.

另外,聚合操做能够跟搜索操做一块儿使用.也就是能够在对文档进行搜索、过滤的同时在同一个请求中对数据进行分析操做.由于搜索和统计都是在同一个执行上下文中的,因此咱们不但能够计算全部尺寸为70的针数量,还能够计算全部尺寸为70而且符合特定条件好比不粘的绣花针数量.

What’s more, aggregations operate alongside search requests. You can search documents, filter results, and perform analytics at the same time, on the same data, in a single request. And because aggregations are calculated in the context of a particular search, you’re not just displaying a count of all size 70 needles, you’re displaying a count of the size 70 needles that match your users' search criteria—for example, all size 70 non-stick embroidery needles.

等等少年,还有功能

But wait, There's more.

想自动分析时序数据?你可使用机器学习功能去计算数据中的基准线识别异常数据.使用机器学习,咱们能够:

Want to automate the analysis of your time-series data? You can use machine learning features to create accurate baselines of normal behavior in your data and identify anomalous patterns. With machine learning, you can detect:

  • 检测不正常的数据、计数和频率

  • > Anomalies related to temporal deviations in values, counts, or frequencies

  • 检测稀有的数据

  • > Statistical rarity

  • 从群体中检测出不正常的成员

  • > Unusual behaviors for a member of a population

更劲爆更强大的是咱们甚至都不须要指定算法训练模型甚至连一些跟数据研究有关的配置都不须要就能够完成.

就问你强大不强大?高级不高级?

And the best part? You can do this without having to specify algorithms, models, or other data science-related configurations.

相关文章
相关标签/搜索