关于 elasticsearch delete by query

elasticsearch delete by query  在 2.0 的时候已经被去除了,官网上说是由于delete by query  强制refresh可能致使在并行索引时发生OutOfMemoryError ,也可能致使主副本不一致,官网建议使用 scroll/scan API 查出ID,而后在根据ID批量删除。html

根据官网建议写了一个delete by query 插件。git

主要代码以下:
    github

SearchResponse scrollResp = client.prepareSearch(index).setTypes(type)
				.setScroll(new TimeValue(60000)).setSize(defaultBatchSize).setQuery(query)
				.execute().actionGet();
		long total = scrollResp.getHits().getTotalHits();
		
		while (true) {
			BulkRequestBuilder requestBuilder = client.prepareBulk().setRefresh(true);

			for (SearchHit hit : scrollResp.getHits().getHits())
				requestBuilder.add(new DeleteRequest(index, type, hit.getId()));

			BulkResponse reponse = requestBuilder.execute().actionGet();
			if (reponse.hasFailures()) {
				for (BulkItemResponse item : reponse) {
					if (item.isFailed()) {
						LOGGER.warn(item.getFailureMessage());
					}
				}
			}
			
            total = total - reponse.getItems().length;
			LOGGER.info("has removed " + reponse.getItems().length + " rows, remain " + total + " rows ...");
			
			scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute()
					.actionGet();

			if (scrollResp.getHits().getHits().length == 0)
				break;
		}

写了一个插件,插件地址:https://github.com/weiyuc/delete_by_query
目前这个插件是针对es 2.4.1版本的。其余版本能够修改一下便可。elasticsearch

相关文章
相关标签/搜索