Phoneix（一）简介及经常使用命令

时间 2019-12-08

标签 phoneix 简介经常使用命令繁體版

原文原文链接

1、简介python

Apache Phoneix是运行在HBase之上的高性能关系型数据库，经过Phoneix能够像使用jdbc访问关系型数据库同样访问HBase。算法

Phoneix操做的表以及数据存储在HBase上，phoneix只须要和HBase进行表关联。而后在用工具进行一些读写操做。sql

能够把Phoneix只当作一种代替HBase语法的工具。虽然Java能够用jdbc来链接phoneix操做，可是在生成环境找那个，不能够用OLTP。phoenix在查询hbase时，虽然作了一些优化，可是延迟仍是不小。因此依然用在OLAT中，在将结果返回存储下来。shell

说明：数据库

　　当今的数据处理大体能够分红两大类：联机事务处理OLTP（on-line transaction processing）、联机分析处理OLAP（On-Line Analytical Processing）。OLTP是传统的关系型数据库的主要应用，主要是基本的、平常的事务处理，例如银行交易。OLAP是数据仓库系统的主要应用，支持复杂的分析操做，侧重决策支持，而且提供直观易懂的查询结果.数组

2、经常使用命令缓存

一、登陆phoneix shellapp

python sqlline.py master:2181

二、基本命令函数

查看table列表  !tables
查看表字段信息  !describe tablename
查看执行历史  !history
查看table 的索引  !index tablename
其余操做 help

三、插入数据工具

在Phoneix是没有Insert语句的，取而代之的是upsert 。
Upsert有两种用法：
1、upsert into旨在单条插入
upsert into tb values('ak','hhh',222)
upsert into tb(stat,city,num) values('ak','hhh',222)
2、upsert select旨在批量插入
upsert into tb1 (state,city,population) select state,city,population from tb2 where population < 40000;
upsert into tb1 select state,city,population from tb2 where population > 40000;
upsert into tb1 select * from tb2 where population > 40000;

注意：

　　　　在Phoenix在插入语句并不会像传统数据库同样存在重复数据，由于phoneix是构建在HBase之上的，也就是主键惟一。后面插入数据会覆盖前面的，可是时间戳不同。

四、删除数据

delete from tb; 
清空表中全部记录，Phoenix中不能使用truncate table tb；
delete from tb where city = 'kenai';
drop table tb;删除表
delete from system.catalog where table_name = 'int_s6a';
drop table if exists tb;
drop table my_schema.tb;
drop table my_schema.tb cascade;用于删除表的同时删除基于该表的全部视图。

五、更新数据

因为HBase的主键设计，相同rowkey的内容能够直接覆盖，这就变相的更新了数据。
因此Phoenix的更新操做仍旧是upsert into 和 upsert select
upsert into us_population (state,city,population) values('ak','juneau',40711);

六、查询数据

union all， group by， order by， limit 都支持
select * from test limit 1000;
select * from test limit 1000 offset 100;
select full_name from sales_person where ranking >= 5.0 
union all select reviewer_name from customer_review where score >= 8.0

七、建立表

a、加盐（SALT_BUCKETS）

加盐Salting可以经过预分区（pre-splitting）数据到多个region中来显著提高读写性能。本质是在hbase中，rowkey的byte数组的第一个字节位置设定一个系统生成的byte值，这个byte值是由rowkey的byte数组作一个哈希算法，计算来的。

SALT_BUCKETS的值范围在（1-256）：

create table TEST1(host varchar not null primary key, description  varchar)salt_buckets=16;
upsert into TEST1 (host,description) values ('192.168.0.1','s1');
upsert into TEST1 (host,description) values ('192.168.0.2','s2');
upsert into TEST1 (host,description) values ('192.168.0.3','s3');

salted table能够自动在每个rowkey前面加上一个字节，这样对于一段连续的rowkeys，
它们在表中实际存储时，就被自动地分布到不一样的region中去了。当指定要读写该段区间内的数据时，
也就避免了读写操做都集中在同一个region上。简而言之，若是咱们用Phoenix建立了一个saltedtable，
那么向该表中写入数据时，原始的rowkey的前面会被自动地加上一个byte（不一样的rowkey会被分配不一样的byte），
使得连续的rowkeys也能被均匀地分布到多个regions。

结果：如图；

b、Pre-split（预分区）

Salting可以自动的设置表预分区，可是咱们得控制表是如何分区的，因此在使用phoneix建表时，能够精确的指定要根据什么值来作预分区，以下实例：

create table TEST2 (host varchar not null primary key, description varchar) split on ('cs','eu','na');

c、使用多列簇

列簇包含相关的数据都在独立的文件中，在Phoneix设置多个列簇能够提升查询性能。

以下建表语句，建立了a,b两个列簇

create table TEST3 ( mykey varchar not null primary key, a.col1 varchar, a.col2 varchar,  b.col3 varchar);
upsert into TEST3 values ('key1','a1','b1','c1');
upsert into TEST3 values ('key2','a2','b2','c2');

d、使用压缩

create table test (host varchar not null primary key, description varchar) compression='snappy';

八、建立视图，删除视图

create view "my_hbase_table"( k varchar primary key, "v" unsigned_long) default_column_family='a';
create view my_view ( new_col smallint ) as select * from my_table where k = 100;
create view my_view_on_view as select * from my_view where new_col > 70;
create view v1 as select *  from test where description in ('s1','s2','s3');
drop view my_view;
drop view if exists my_schema.my_view;
drop view if exists my_schema.my_view cascade;

九、建立索引

建立二级索引支持可变数据和不可变数据（数据插入后再也不更新）上创建二级索引

create index my_idx on opportunity(last_updated_date desc);全局索引
create index my_idx on event(created_date desc) include (name, payload) salt_buckets=10;覆盖索引并加盐
create index my_idx on sales.opportunity(upper(contact_name));函数索引
create index test_index on test (host) include (description);覆盖索引

十、删除索引

drop index 索引名 on 表名;

十一、默认是可变表，手动建立不可变表

create table hao2 (k varchar primary key, v varchar) immutable_rows=true;
alter table HAO2 set IMMUTABLE_ROWS = false;    修改成可变
alter index index1 on tb rebuild;索引重建是把索引表清空后从新装配数据。

十二、Global Indexing多读少写，适合条件较少

CREATE INDEX IPINDEX ON TEST(IP);
调用方法：强制索引
SELECT /*+ INDEX(TEST IPINDEX) */ * FROM TEST WHERE IP='139.204.122.144';

1三、覆盖索引 Covered Indexes，须要include包含须要返回数据结果的列

create index index1_c on hao1 (age) include(name);  name已经被缓存在这张索引表里了。
对于select name from hao1 where age=2，查询效率和速度最快
select * from hao1 where age =2，其余列不在索引表内，会全表扫描

1四、Local Indexing写多读少，不是索引字段索引表也会被使用，索引数据和真实数据存储在同一台机器上

CREATE LOCAL INDEX IPINDEX ON TEST(IP);