参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.htmlhtml
参考:https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.htmljava
$ bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.2/elasticsearch-analysis-ik-6.6.2.zipnode
参考:https://github.com/medcl/elasticsearch-analysis-ikgit
其余pluginsgithub
参考:https://www.elastic.co/guide/en/elasticsearch/plugins/current/index.htmlsql
# curl -XPUT -H 'Content-Type: application/json' http://localhost:9200/testdoc -d ' { "settings": { "index.number_of_shards" : 10, "index.number_of_routing_shards" : 30, "index.number_of_replicas":1, "index.translog.durability": "async", "index.merge.scheduler.max_thread_count": 1, "index.refresh_interval": "30s" }, "mappings": { "_doc": { "_all": { "enabled": false }, "_source": { "enabled": false }, "properties": { "title": { "type": "text", "analyzer": "ik_smart"}, "name": { "type": "keyword", "doc_values": false}, "age": { "type": "integer", "index": false}, "created": { "type": "date", "format": "strict_date_optional_time||epoch_millis" } } } } }'
其中:apache
_source 控制是否存储原始json
_all 控制是否对原始json建倒排
analyzer 用于指定分词
doc_values 用于控制是否列式存储
index 用于控制是否倒排json
The _source field stores the original JSON body of the document. If you don’t need access to it you can disable it.
By default Elasticsearch indexes and adds doc values to most fields so that they can be searched and aggregated out of the box.api
参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html并发
其中String有两种:text和keyword,区别是text会被分词,keyword不会被分词;
参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html
参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html
参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html
详见:http://www.javashuo.com/article/p-fpudqamz-bv.html
# curl -XPOST -H 'Content-Type: application/json' 'http://localhost:9200/_xpack/sql?format=txt' -d '{"query":"select * from testdoc limit 10"}'
or
# curl -XGET 'http://localhost:9200/testdoc/_search?q=*'
2019-03-27 03:14:50,091 ERROR [main] org.elasticsearch.hadoop.rest.NetworkClient: Node [192.168.0.1:9200] failed (Read timed out); selected next node [192.168.0.1:9200] 2019-03-27 03:15:50,148 ERROR [main] org.elasticsearch.hadoop.rest.NetworkClient: Node [192.168.0.2:9200] failed (Read timed out); selected next node [192.168.0.2:9200] 2019-03-27 03:16:50,207 ERROR [main] org.elasticsearch.hadoop.rest.NetworkClient: Node [192.168.0.3:9200] failed (Read timed out); no other nodes left - aborting... 2019-03-27 03:16:50,208 ERROR [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing operators - failing tree 2019-03-27 03:16:50,210 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[192.168.0.1:9200, 192.168.0.2:9200, 192.168.0.3:9200]] at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:152) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:398) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:362) at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:366) at org.elasticsearch.hadoop.rest.RestClient.refresh(RestClient.java:267) at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.close(BulkProcessor.java:550) at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:219) at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.doClose(EsOutputFormat.java:214) at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.close(EsHiveOutputFormat.java:74) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:190) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1047) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) ... 8 more
解决方法:增长 index.number_of_shards,只能在建立索引时指定,默认为5
Caused by: org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [70/1000]. Error sample (first [5] error messages): org.elasticsearch.hadoop.rest.EsHadoopRemoteException: es_rejected_execution_exception: rejected execution of processing of [7622922][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[test_indix][18]] containing [38] requests, target allocation id: iLlIBScJTxahse559pTINQ, primary term: 1 on EsThreadPoolExecutor[name = 1hxgYU_/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@ce11763[Running, pool size = 32, active threads = 32, queued tasks = 200, completed tasks = 5686436]]
报错缘由:
thread_pool.write.queue_size
For single-document index/delete/update and bulk requests. Thread pool type is fixed with a size of # of available processors, queue_size of 200. The maximum size for this pool is 1 + # of available processors.
The queue_size allows to control the size of the queue of pending requests that have no threads to execute them. By default, it is set to -1 which means its unbounded. When a request comes in and the queue is full, it will abort the request.
查看thread_pool统计
# curl 'http://localhost:9200/_nodes/stats?pretty'|grep '"write"' -A 7
一般因为写入速度、并发量或者压力较大超过es处理能力,超出queue的大小就会被reject
解决方法:
1)修改配置调优
index.refresh_interval: -1
index.number_of_replicas: 0
indices.memory.index_buffer_size: 40%
thread_pool.write.queue_size: 1024
详见:http://www.javashuo.com/article/p-afpkucod-ch.html
2)减少写入压力