[TOC]html
Phoenix做为应用层和HBASE之间的中间件,如下特性使它在大数据量的简单查询场景有着独有的优点sql
通常可使用如下三种方式访问Phoenixshell
Ambari其实已经自带了Phoenix,因此安装比较简单,勾起来直接点击安装就行了。apache
不过在ambari2.4.2中,Phoenix版本4.7.0与HBase1.1.2中,使用sqlline.py链接的时候会报错缓存
Class org.apache.phoenix.coprocessor.MetaDataEndpointImpl cannot be loaded Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks
按照提示去增长配置,重启HBase就行了。异步
新版本ambari2.7.3没有这个问题。Phoenix 5.0.0 HBase2.0.0工具
本次使用4.7.0版本做为示例,并使用phoenix-sqlline ${zookeeper}演示。oop
如下只有部分演示。
CREATE TABLE IF NOT EXISTS ljktest (ID VARCHAR PRIMARY KEY,NAME VARCHAR,AGE TINYINT)
CREATE TABLE IF NOT EXISTS "ljktest" (ID VARCHAR PRIMARY KEY,NAME VARCHAR,AGE TINYINT)
CREATE TABLE LJKTEST2 (ID INTEGER NOT NULL,AGE TINYINT NOT NULL,NAME VARCHAR,CONSTRAINT PK PRIMARY KEY(ID, AGE)) TTL = 86400;
在hbase-shell里面查看下是否有表生成,能够看到默认会在HBase这边表是大写的,要区分大小写,加入引号便可。而且当中加入了不少协处理器。不指定列簇的话,默认列簇用0来表示。
hbase(main):008:0> desc 'LJKTEST' Table LJKTEST is ENABLED LJKTEST, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObser ver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObser ver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserve r|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|80530 6366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|index.builder=org.ap ache.phoenix.index.PhoenixIndexBuilder,org.apache.hadoop.hbase.index.codec.class=org.apache.phoe nix.index.PhoenixIndexCodec'} COLUMN FAMILIES DESCRIPTION {NAME => '0', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false' , KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'FAST_DI FF', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CAC HE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_B LOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 1 row(s) Took 0.1066 seconds
CREATE VIEW LJK_TEST(ROWKEY VARCHAR PRIMARY KEY,"mycf"."name" VARCHAR);
`CREATE TABLE LJK_TEST (ROWKEY VARCHAR PRIMARY KEY,
"mycf"."name" VARCHAR) COLUMN_ENCODED_BYTES=0;`
简单插入
UPSERT INTO LJKTEST VALUES('0001','LINJIKAI',18);
同步在hbase-shell看到的结果是
hbase(main):010:0> scan 'LJKTEST' ROW COLUMN+CELL 0001 column=0:\x00\x00\x00\x00, timestamp=1557719617275, value=x 0001 column=0:\x80\x0B, timestamp=1557719617275, value=LINJIKAI 0001 column=0:\x80\x0C, timestamp=1557719617275, value=\x92 1 row(s) Took 0.0079 seconds
UPSERT INTO LJKTEST(ID,NAME) VALUES('0002','张三');
重复KEY插入策略
更新字段
UPSERT INTO LJKTEST VALUES('0003','李四',19) ON DUPLICATE KEY UPDATE AGE = AGE + 1;
效果就是若是碰到value相同的状况则年龄会大一岁,展现结果以下,李四最后的年龄变成了20岁
0: jdbc:phoenix:> UPSERT INTO LJKTEST VALUES('0003','李四',19) ON DUPLICATE KEY UPDATE AGE = AGE + 1; 1 row affected (0.011 seconds) 0: jdbc:phoenix:> SELECT * FROM LJKTEST; +-------+-----------+-------+ | ID | NAME | AGE | +-------+-----------+-------+ | 0001 | LINJIKAI | 18 | | 0002 | 张三 | null | | 0003 | 李四 | 20 | +-------+-----------+-------+ 3 rows selected (0.026 seconds)
不更新
UPSERT INTO LJKTEST VALUES('0003','李四',19) ON DUPLICATE KEY IGNORE;
这种写法不会改变李四的年龄,由于这条数据的key已经存在了。
Phoenix特别强大,由于咱们提供了覆盖索引。一旦找到索引条目,咱们就不须要返回主表了。相反,咱们将咱们关心的数据捆绑在索引行中,从而节省了读取时间开销
建一张表来测试二级索引机制,如下其余功能索引也会使用这张表来建对应二级索引
CREATE TABLE LJKTEST (ID CHAR(4) NOT NULL PRIMARY KEY,AGE UNSIGNED_TINYINT,NAME VARCHAR,COMPANY VARCHAR,SCHOOL VARCHAR)
并插入一些数据
UPSERT INTO LJKTEST VALUES('0001',18,'张三','张三公司','张三学校'); UPSERT INTO LJKTEST VALUES('0002',19,'李四','李四公司','李四学校'); UPSERT INTO LJKTEST VALUES('0003',20,'王五','王五公司','王五学校'); UPSERT INTO LJKTEST VALUES('0004',21,'赵六','赵六公司','赵六学校');
建立多字段覆盖索引 CREATE INDEX COVER_LJK_INDEX ON LJKTEST(COMPANY,SCHOOL) INCLUDE (NAME);
查看索引逻辑
能够看到使用school字段去查询的话,不会走索引,进入了全表扫描。
0: jdbc:phoenix:> EXPLAIN SELECT NAME FROM LJKTEST WHERE SCHOOL ='张三学校'; +---------------------------------------------------------------------------+------------------+ | PLAN | EST_BYTES_READ | +---------------------------------------------------------------------------+------------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER COVER_LJK_INDEX | null | | SERVER FILTER BY "SCHOOL" = '张三学校' | null | +---------------------------------------------------------------------------+------------------+
但若是使用company字段去查询则会走前缀索引,进入了range scan.
0: jdbc:phoenix:> EXPLAIN SELECT NAME FROM LJKTEST WHERE COMPANY ='张三公司'; +-------------------------------------------------------------------------------------+--------+ | PLAN | EST_BY | +-------------------------------------------------------------------------------------+--------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER COVER_LJK_INDEX ['张三公司'] | null | +-------------------------------------------------------------------------------------+--------+
接下来建立一个单字段覆盖索引看看
CREATE INDEX COVER_LJK_INDEX_COMPANY ON LJKTEST(COMPANY) INCLUDE (NAME);
0: jdbc:phoenix:> EXPLAIN SELECT NAME FROM LJKTEST WHERE COMPANY ='张三公司'; +----------------------------------------------------------------------------------------------+ | PLAN | +----------------------------------------------------------------------------------------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER COVER_LJK_INDEX_COMPANY ['张三公司'] | +----------------------------------------------------------------------------------------------+ 1 row selected (0.028 seconds)
当使用*号做为字段去检索时,走的FULL SCAN。
0: jdbc:phoenix:> EXPLAIN SELECT /*+ INDEX(LJKTEST COVER_LJK_INDEX_COMPANY)*/ * FROM LJKTEST WHERE COMPANY ='张三公司'; +----------------------------------------------------------------------------------------------+ | PLAN | +----------------------------------------------------------------------------------------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER LJKTEST | | SKIP-SCAN-JOIN TABLE 0 | | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER COVER_LJK_INDEX_COMPANY_ON | | SERVER FILTER BY FIRST KEY ONLY | | DYNAMIC SERVER FILTER BY "LJKTEST.ID" IN ($34.$36) | +----------------------------------------------------------------------------------------------+
这个时候你须要使用hint来指定索引
0: jdbc:phoenix:> EXPLAIN SELECT /*+INDEX(LJKTEST COVER_LJK_INDEX_COMPANY)*/* FROM LJKTEST WHERE COMPANY='张三公司'; +-----------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+ | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | +-----------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER LJKTEST | null | null | null | | SKIP-SCAN-JOIN TABLE 0 | null | null | null | | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER COVER_LJK_INDEX_COMPANY ['张三公司'] | null | null | null | | SERVER FILTER BY FIRST KEY ONLY | null | null | null | | DYNAMIC SERVER FILTER BY "LJKTEST.ID" IN ($10.$12) | null | null | null | +-----------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+ 5 rows selected (0.046 seconds)
CREATE INDEX SCHOOL_WITH_COMPANY ON LJKTEST(COMPANY||' '||SCHOOL)
本地索引须要增长LOCAL关键字。上面都是全局索引。
CREATE LOCAL INDEX COVER_LJK_INDEX_COMPANY ON LJKTEST(COMPANY) INCLUDE (NAME);
适合写多读少的状况,数据会存储在原表中,侵入性强。
适合读多写少的状况,数据会存储上单独的一张表中。
默认状况下,建立索引时,会在CREATE INDEX调用期间同步填充索引。可是数据表的当前大小,这多是不可行的。从4.5开始,经过在索引建立DDL语句中包含ASYNC关键字,能够异步完成索引的填充
CREATE INDEX INDEX_LJKTEST_AGE_ASYNC ON LJKTEST(AGE) INCLUDE(SCHOOL) ASYNC;
这只是第一步,你还必须经过HBase命令行单独启动填充索引表的map reduce做业
hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table LJKTEST --index-table INDEX_LJKTEST_AGE_ASYNC --output-path ASYNC_IDX_HFILES
只有当map reduce做业完成时,才会激活索引并开始在查询中使用。
output-path选项用于指定用于写入HFile的HDFS目录
对于其中数据仅写入一次且从未就地更新的表,能够进行某些优化以减小增量维护的写入时间开销。这对于时间序列数据(例如日志或事件数据)很常见,一旦写入一行,就永远不会更新。要利用这些优化,请经过向DDL语句添加IMMUTABLE_ROWS = true属性将表声明为不可变
CREATE TABLE LJKTEST_IMMU (ROWKEY VARCHAR PRIMARY KEY, NAME VARCHAR,AGE VARCHAR) IMMUTABLE_ROWS=true;
CREATE INDEX INDEX_LJKTEST_IMMU ON LJKTEST_IMMU(NAME) INCLUDE(AGE);
测试了下不变索引的特性
即便rowkey相同的数据,作了更新动做,在phoenix中则会作追加动做
UPSERT INTO LJKTEST_IMMU VALUES('1','LILEI','18'); UPSERT INTO LJKTEST_IMMU VALUES('1','HANGMEIMEI','18');
0: jdbc:phoenix:dn1> select * from LJKTEST_IMMU; +---------+-------------+------+ | ROWKEY | NAME | AGE | +---------+-------------+------+ | 1 | HANGMEIMEI | 18 | | 1 | LILEI | 18 | +---------+-------------+------+ 2 rows selected (0.078 seconds) 0: jdbc:phoenix:dn1> SELECT * FROM INDEX_LJKTEST_IMMU; +-------------+----------+--------+ | 0:NAME | :ROWKEY | 0:AGE | +-------------+----------+--------+ | HANGMEIMEI | 1 | 18 | | LILEI | 1 | 18 | +-------------+----------+--------+ 2 rows selected (0.021 seconds) 0: jdbc:phoenix:dn1> SELECT * FROM LJKTEST_IMMU WHERE ROWKEY='1'; +---------+-------------+------+ | ROWKEY | NAME | AGE | +---------+-------------+------+ | 1 | HANGMEIMEI | 18 | +---------+-------------+------+ 1 row selected (0.017 seconds) 0: jdbc:phoenix:dn1> SELECT * FROM LJKTEST_IMMU WHERE NAME='LILEI'; +---------+--------+------+ | ROWKEY | NAME | AGE | +---------+--------+------+ | 1 | LILEI | 18 | +---------+--------+------+ 1 row selected (0.024 seconds) 0: jdbc:phoenix:dn1> SELECT * FROM LJKTEST_IMMU WHERE AGE='18'; +---------+-------------+------+ | ROWKEY | NAME | AGE | +---------+-------------+------+ | 1 | HANGMEIMEI | 18 | | 1 | LILEI | 18 | +---------+-------------+------+ 2 rows selected (0.027 seconds)
在hbase表中,则是根据原来的规则,rowkey相同替换对应字段
hbase(main):002:0> scan 'LJKTEST_IMMU' ROW COLUMN+CELL 1 column=0:AGE, timestamp=1563241706845, value=18 1 column=0:NAME, timestamp=1563241706845, value=HANGMEIMEI 1 column=0:_0, timestamp=1563241706845, value=x 1 row(s) in 0.0590 seconds
<font color="red">使用Phoenix 4.12</font>,如今有一个工具能够运行MapReduce做业来验证索引表是否对其数据表有效。在任一表中查找孤立行的惟一方法是扫描表中的全部行,并在另外一个表中查找相应行。所以,该工具可使用数据或索引表做为“源”表运行,另外一个做为“目标”表运行。该工具将找到的全部无效行写入文件或输出表PHOENIX_INDEX_SCRUTINY。无效行是源行,它在目标表中没有对应的行,或者在目标表中具备不正确的值(即覆盖的列值)。
hbase org.apache.phoenix.mapreduce.index.IndexScrutinyTool -dt my_table -it my_index -o
或者
HADOOP_CLASSPATH=$(hbase mapredcp) hadoop jar phoenix-<version>-server.jar org.apache.phoenix.mapreduce.index.IndexScrutinyTool -dt my_table -it my_index -o
开箱即用,索引很是快。可是,要针对特定环境和工做负载进行优化,能够调整几个属性。
属性名 | 描述 | 默认值 |
---|---|---|
index.builder.threads.max | 用于从主表更新构建索引更新的线程数 | 10 |
index.builder.threads.keepalivetime | 咱们使构建器线程池中的线程到期后的时间量(以秒为单位)。 | 60 |
index.writer.threads.max | 写入目标索引表时要使用的线程数。 | 10 |
index.writer.threads.keepalivetime | 咱们使写入器线程池中的线程到期后的时间量(以秒为单位)。 | 60 |
hbase.htable.threads.max | 每一个索引HTable可用于写入的线程数。 | 2,147,483,647 |
hbase.htable.threads.keepalivetime | 咱们使HTable的线程池中的线程到期后的时间量(以秒为单位) | 60 |
index.tablefactory.cache.size | 咱们应该在缓存中保留的索引HTable的数量。 | 10 |
org.apache.phoenix.regionserver.index.priority.min | 指定索引优先级所在范围的底部(包括)的值。 | 1000 |
org.apache.phoenix.regionserver.index.priority.max | 指定索引优先级所在范围的顶部(不包括)的值。 | 1050 |
org.apache.phoenix.regionserver.index.handler.count | 为全局索引维护提供索引写入请求时要使用的线程数。 | 30 |
19/05/14 11:23:40 WARN iterate.BaseResultIterators: Unable to find parent table "LJKTEST" of table "COVER_LJK_INDEX_COMPANY_ONLY" to determine USE_STATS_FOR_PARALLELIZATION org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table undefined. tableName=LJKTEST
其中若是建立索引表报错,根据错误信息去修改hbase-site,重启hbase就行了。
Error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server. tableName=COVER_LJK_INDEX (state=42Y88,code=1029)