大数据架构-使用HBase和Solr将存储与索引放在不一样的机器上

时间 2019-11-06

标签数据架构使用 hbase solr 存储索引放在不一样机器栏目系统架构繁體版

原文原文链接

摘要： HBase能够经过协处理器 Coprocessor 的方式向Solr发出请求，Solr对于接收到的数据能够作相关的同步：增、删、改索引的操做，这样就能够同时使用HBase存储量大和Solr检索性能高的优势了，更况且HBase和Solr均可以集群。这对海量数据存储、检索提供了一种方式，将存储与索引放在不一样的机器上，是大数据架构的必须品。 html

关键词： HBase, Solr, Coprocessor , 大数据 , 架构 java

正如个人以前的博客“ Solr与HBase架构设计 ”中所述，HBase和Solr能够经过协处理器 Coprocessor 的方式向Solr发出请求，Solr对于接收到的数据能够作相关的同步：增、删、改索引的操做。将存储与索引放在不一样的机器上，这是大数据架构的必须品，但目前还有不少不懂得此道的同窗，他们对于这种思想感到很新奇，不过，这绝对是好的方向，因此不懂得抓紧学习吧。 apache

有个朋友给个人那篇博客留言，说CDH也能够作这样的事情，我尚未试过，他还问我要与此相关的代码，因而我就稍微整理了一下，做为本篇文章的主要内容。关于CDH的事，我会尽快尝试，有知道的同窗能够给我留言。架构

下面我主要讲述一下，我测试对HBase和Solr的性能时，使用HBase 协处理器向HBase添加数据所编写的相关代码，及解释说明。 ide

1、编写HBase协处理器Coprocessor oop

一旦有数据postPut，就当即对Solr里相应的Core更新。这里使用了 ConcurrentUpdateSolrServer，它是Solr速率性能的保证，使用它不要忘记在Solr里面配置autoCommit哟。 post

/* 性能

*版权：王安琪学习

*描述：监视HBase，一有数据postPut就向Solr发送，本类要做为触发器添加到HBase 测试

*修改时间：2014-05-27

*修改内容：新增

package solrHbase.test;

import java.io.UnsupportedEncodingException;

import ***;

public class SorlIndexCoprocessorObserver extends BaseRegionObserver {

private static final Logger LOG = LoggerFactory

.getLogger(SorlIndexCoprocessorObserver.class);

private static final String solrUrl = "http://192.1.11.108:80/solr/core1";

private static final SolrServer solrServer = new ConcurrentUpdateSolrServer(

solrUrl, 10000, 20);

/**

* 创建solr索引

* @throws UnsupportedEncodingException

@Override

public void postPut(final ObserverContext e,

final Put put, final WALEdit edit, final boolean writeToWAL)

throws UnsupportedEncodingException {

inputSolr(put);

}

public void inputSolr(Put put) {

try {

solrServer.add(TestSolrMain.getInputDoc(put));

} catch (Exception ex) {

LOG.error(ex.getMessage());

}

注意：getInputDoc是这个HBase协处理器Coprocessor的精髓所在，它能够把HBase内的Put里的内容转化成Solr须要的值。其中 String fieldName = key.substring(key.indexOf( columnFamily ) + 3, key.indexOf( " 我在这" )).trim(); 这里有一个乱码字符，在这里看不到，请你们注意一下。

public static SolrInputDocument getInputDoc(Put put) {

SolrInputDocument doc = new SolrInputDocument();

doc.addField("test_ID", Bytes.toString(put.getRow()));

for (KeyValue c : put.getFamilyMap().get(Bytes.toBytes(columnFamily))) {

String key = Bytes.toString(c.getKey());

String value = Bytes.toString(c.getValue());

if (value.isEmpty()) {

continue;

}

String fieldName = key.substring(key.indexOf(columnFamily) + 3,

key.indexOf(" ")).trim();

doc.addField(fieldName, value);

}

return doc;

}

2、编写测试程序入口代码main

这段代码向HBase请求建了一张表，并将模拟的数据，向HBase连续地提交数据内容，在HBase中不断地插入数据，同时记录时间，测试插入性能。

*版权：王安琪

*描述：测试HBaseInsert，HBase插入性能

*修改时间：2014-05-27

*修改内容：新增

package solrHbase.test;

import hbaseInput.HbaseInsert;

import ***;

public class TestHBaseMain {

private static Configuration config;

private static String tableName = "angelHbase";

private static HTable table = null;

private static final String columnFamily = "wanganqi";

/**

* @param args

public static void main(String[] args) {

config = HBaseConfiguration.create();

config.set("hbase.zookeeper.quorum", "192.103.101.104");

HbaseInsert.createTable(config, tableName, columnFamily);

try {

table = new HTable(config, Bytes.toBytes(tableName));

for (int k = 0; k < 1; k++) {

Thread t = new Thread() {

public void run() {

for (int i = 0; i < 100000; i++) {

HbaseInsert.inputData(table,

PutCreater.createPuts(1000, columnFamily));

Calendar c = Calendar.getInstance();

String dateTime = c.get(Calendar.YEAR) + "-"

+ c.get(Calendar.MONTH) + "-"

+ c.get(Calendar.DATE) + "T"

+ c.get(Calendar.HOUR) + ":"

+ c.get(Calendar.MINUTE) + ":"

+ c.get(Calendar.SECOND) + ":"

+ c.get(Calendar.MILLISECOND) + "Z 写入: "

+ i * 1000;

System.out.println(dateTime);

}

};

t.start();

}

} catch (IOException e1) {

e1.printStackTrace();

}

下面的是与HBase相关的操做，把它封装到一个类中，这里就只有建表与插入数据的相关代码。

*版权：王安琪

*描述：与HBase相关操做，建表与插入数据

*修改时间：2014-05-27

*修改内容：新增

package hbaseInput;

import ***;

import org.apache.hadoop.hbase.client.Put;

public class HbaseInsert {

public static void createTable(Configuration config, String tableName,

String columnFamily) {

HBaseAdmin hBaseAdmin;

try {

hBaseAdmin = new HBaseAdmin(config);

if (hBaseAdmin.tableExists(tableName)) {

return;

}

HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);

tableDescriptor.addFamily(new HColumnDescriptor(columnFamily));

hBaseAdmin.createTable(tableDescriptor);

hBaseAdmin.close();

} catch (MasterNotRunningException e) {

e.printStackTrace();

} catch (ZooKeeperConnectionException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

public static void inputData(HTable table, ArrayList puts) {

try {

table.put(puts);

table.flushCommits();

puts.clear();

} catch (IOException e) {

e.printStackTrace();

}

3、编写模拟数据Put

向HBase中写入数据须要构造Put，下面是我构造模拟数据Put的方式，有字符串的生成，我是由mmseg提供的词典 words.dic 中随机读取一些词语链接起来，生成一句字符串的，下面的代码没有体现，不过很easy，你本身造你本身想要的数据就OK了。

public static Put createPut(String columnFamily) {

String ss = getSentence();

byte [] family = Bytes. toBytes (columnFamily);

byte[] rowKey = Bytes.toBytes("" + Math.abs(r.nextLong()));

Put put = new Put(rowKey);

put.add(family, Bytes.toBytes("DeviceID"),

Bytes.toBytes("" + Math.abs(r.nextInt())));

******

put.add(family, Bytes.toBytes(" Company_mmsegsm "), Bytes.toBytes("ss"));

return put;

}

固然在运行上面这个程序以前，须要先在Solr里面配置好你须要的列信息，HBase、Solr安装与配置，它们的基础使用方法将会在以后的文章中介绍。在这里，Solr的列配置就跟你使用createPut生成的Put搞成同样的列名就好了，固然也可使用动态列的形式。

4、直接对Solr性能测试

若是你不想对HBase与Solr的相结合进行测试，只想单独对Solr的性能进行测试，这就更简单了，彻底能够利用上面的代码段来测试，稍微组装一下就能够了。

private static void sendConcurrentUpdateSolrServer(final String url,

final int count) throws SolrServerException, IOException {

SolrServer solrServer = new ConcurrentUpdateSolrServer(url, 10000, 20);

for (int i = 0; i < count; i++) {

solrServer.add(getInputDoc(PutCreater.createPut(columnFamily)));

}