Hbase 学习（三）Coprocessors

时间 2019-11-07

原文原文链接

Coprocessors

以前咱们的filter都是在客户端定义，而后传到服务端去执行的，这个Coprocessors是在服务端定义，在客户端调用，而后在服务端执行，他有点儿想咱们熟悉的存储过程，传一些参数进去，而后进行咱们事先定义好的操做，咱们经常用它来作一些好比二次索引啊，统计函数什么的，它也和自定义filter同样，须要事先定好，而后在hbase-env.sh中的HBASE_CLASSPATH中指明，就像个人上一篇中的写的那样。java

Coprocessors分两种，observer和endpoint。数据库

（1）observer就像触发器同样，当某个事件发生的时候，它就出发。ide

已经有一些内置的接口让咱们去实现，RegionObserver、MasterObserver、WALObserver，看名字就大概知道他们是干吗的。函数

（2）endpoint能够认为是自定义函数，能够把这个理解为关系数据库的存储过程。spa

全部的Coprocessor都是实现自Coprocessor 接口，它分SYSTEM和USER，前者的优先级比后者的优先级高，先执行。3d

它有两个方法，start和stop方法，两个方法都有一个相同的上下文对象CoprocessorEnvironment。code

void start(CoprocessorEnvironment env) throws IOException;
void stop(CoprocessorEnvironment env) throws IOException;orm

这是CoprocessorEnvironment的方法。server

Working with Tablesxml

对表进行操做的时候，必须先调用getTable方法活得HTable，不能够本身定义一个HTable，目前貌似没有禁止，可是未来会禁止。

而且在对表操做的时候，不能对行加锁。

Coprocessor Loading

Coprocessor加载须要在配置文件里面全局加载，好比在hbase-site.xml中设置。
<property>
    <name>hbase.coprocessor.region.classes</name>
    <value>coprocessor.RegionObserverExample,coprocessor.AnotherCoprocessor</value>
</property>
<property>
    <name>hbase.coprocessor.master.classes</name>
    <value>coprocessor.MasterObserverExample</value>
</property>
<property>
    <name>hbase.coprocessor.wal.classes</name>
    <value>coprocessor.WALObserverExample,bar.foo.MyWALObserver</value>
</property>
咱们自定义的时间能够注册到三个配置项上，分别是hbase.coprocessor.region.classes，hbase.coprocessor.master.classes，

hbase.coprocessor.wal.classes上，他们分别负责region，master，wal，注册到region的要特别注意当心，由于它是针对全部表的。
<property>
    <name>hbase.coprocessor.region.classes</name>
    <value>coprocessor.RegionObserverExample</value></property>
注册到这三个触发器上，能够监控到几乎全部咱们的操做上面，很是恐怖。。能够说是想要什么就有什么，详细的代码你们本身去摸索。

EndPoint的能够用来定义聚合函数，咱们能够调用CoprocessorProtocol中的方法来实现咱们的需求。

调用coprocessorProxy() 传一个单独的row key，这是在单独一个region上操做的。

要在全部region上面操做，咱们要调用coprocessorExec()方法传一个开始row key 和结束row key。

Demo

说了那么多废话，我都很差意思再说了，来个例子吧，统计行数的。

public interface RowCountProtocol extends CoprocessorProtocol {    long getRowCount() throws IOException;    long getRowCount(Filter filter) throws IOException;    long getKeyValueCount() throws IOException;
}

public class RowCountEndpoint extends BaseEndpointCoprocessor implements
        RowCountProtocol {    private long getCount(Filter filter, boolean countKeyValues)
            throws IOException {
        Scan scan = new Scan();
        scan.setMaxVersions(1);        if (filter != null) {
            scan.setFilter(filter);
        }
        RegionCoprocessorEnvironment environment = (RegionCoprocessorEnvironment) getEnvironment();        // use an internal scanner to perform scanning.
        InternalScanner scanner = environment.getRegion().getScanner(scan);        int result = 0;        try {
            List<KeyValue> curVals = new ArrayList<KeyValue>();
            boolean done = false;            do {
                curVals.clear();
                done = scanner.next(curVals);
                result += countKeyValues ? curVals.size() : 1;
            } while (done);
        } finally {
            scanner.close();
        }        return result;
    }

    @Override    public long getRowCount() throws IOException {        return getRowCount(new FirstKeyOnlyFilter());
    }

    @Override    public long getRowCount(Filter filter) throws IOException {        return getCount(filter, false);
    }

    @Override    public long getKeyValueCount() throws IOException {        return getCount(null, true);
    }
}

写完以后，注册一下吧。

<property>
    <name>hbase.coprocessor.region.classes</name>
    <value>coprocessor.RowCountEndpoint</value></property>

JAVA 客户端调用

在服务端定义以后，咱们怎么在客户端用java代码调用呢，看下面的例子你就明白啦！

public class EndPointExample {    public static void main(String[] args) throws IOException {
        Configuration conf = HBaseConfiguration.create();
        HTable table = new HTable(conf, "testtable");        try {
            Map<byte[], Long> results = table.coprocessorExec(
                    RowCountProtocol.class, null, null,                    new Batch.Call<RowCountProtocol, Long>() {
                        @Override                        public Long call(RowCountProtocol counter)                                throws IOException {                            return counter.getRowCount();
                        }
                    });            long total = 0;            for (Map.Entry<byte[], Long> entry : results.entrySet()) {
                total += entry.getValue().longValue();
                System.out.println("Region: " + Bytes.toString(entry.getKey())                        + ", Count: " + entry.getValue());
            }
            System.out.println("Total Count: " + total);
        } catch (Throwable throwable) {
            throwable.printStackTrace();
        }
    }

}

经过table的coprocessorExec方法调用，而后调用RowCountProtocol接口的getRowCount（）方法。

而后遍历每一个Region返回的结果，合起来就是最终的结果，打印结果以下。

Region:
testtable,,1303417572005.51f9e2251c29ccb2...cbcb0c66858f.,
Count: 2Region:
testtable,row3,1303417572005.7f3df4dcba3f...dbc99fce5d87.,
Count: 3Total Count: 5

在上面的例子当中，咱们是用Batch.Call()方法来调用接口当中的方法，咱们能够用另一个方法来简化上述代码，来看例子。
Batch.Call call =Batch.forMethod(RowCountProtocol.class,"getKeyValueCount");
Map<byte[], Long> results = table.coprocessorExec(RowCountProtocol.class, null, null, call);

采用Batch.Call方法调用同时调用多个方法

Map<byte[], Pair<Long, Long>> results =table.coprocessorExec(
RowCountProtocol.class,null, null,new Batch.Call<RowCountProtocol, Pair<Long, Long>>()
{    public Pair<Long, Long> call(RowCountProtocol counter) throws IOException {        return new Pair(counter.getRowCount(),counter.getKeyValueCount());
    }
});long totalRows = 0;long totalKeyValues = 0;for (Map.Entry<byte[], Pair<Long, Long>> entry :results.entrySet()) {
    totalRows +=
    entry.getValue().getFirst().longValue();
    totalKeyValues +=entry.getValue().getSecond().longValue();
    System.out.println("Region: " +Bytes.toString(entry.getKey()) +", Count: " + entry.getValue());
}
System.out.println("Total Row Count: " + totalRows);
System.out.println("Total KeyValue Count: " +totalKeyValues);

调用coprocessorProxy()在单个region上执行
RowCountProtocol protocol = table.coprocessorProxy(RowCountProtocol.class, Bytes.toBytes("row4"));long rowsInRegion = protocol.getRowCount();
System.out.println("Region Row Count: " +rowsInRegion);
上面这个例子是查找row4行所在region的数据条数，这个能够帮助咱们统计每一个region上面的数据分布。