clickhouse基础知识

时间 2019-11-21

标签 clickhouse 基础知识繁體版

原文原文链接

Clickhouse是一个用于联机分析处理（OLAP）的列式数据库管理系统（columnar DBMS）。
传统数据库在数据大小比较小，索引大小适合内存，数据缓存命中率足够高的情形下能正常提供服务。但残酷的是，这种理想情形最终会随着业务的增加走到尽头，查询会变得愈来愈慢。你可能经过增长更多的内存，订购更快的磁盘等等来解决问题（纵向扩展），但这只是拖延解决本质问题。若是你的需求是解决怎样快速查询出结果，那么ClickHouse也许能够解决你的问题。正则表达式

应用场景：
1.绝大多数请求都是用于读访问的
2.数据须要以大批次（大于1000行）进行更新，而不是单行更新；或者根本没有更新操做
3.数据只是添加到数据库，没有必要修改
4.读取数据时，会从数据库中提取出大量的行，但只用到一小部分列
5.表很“宽”，即表中包含大量的列
6.查询频率相对较低（一般每台服务器每秒查询数百次或更少）
7.对于简单查询，容许大约50毫秒的延迟
8.列的值是比较小的数值和短字符串（例如，每一个URL只有60个字节）
9.在处理单个查询时须要高吞吐量（每台服务器每秒高达数十亿行）
10.不须要事务
11.数据一致性要求较低
12.每次查询中只会查询一个大表。除了一个大表，其他都是小表
13.查询结果显著小于数据源。即数据有过滤或聚合。返回结果不超过单个服务器内存大小数据库

相应地，使用ClickHouse也有其自己的限制：json

1.不支持真正的删除/更新支持不支持事务（期待后续版本支持）
2.不支持二级索引
3.有限的SQL支持，join实现不同凡响
4.不支持窗口功能
5.元数据管理须要人工干预维护数组

经常使用SQL语法

-- 列出数据库列表
show databases;

-- 列出数据库中表列表
show tables;

-- 建立数据库
create database test;

-- 删除一个表
drop table if exists test.t1;

-- 建立第一个表
create /*temporary*/ table /*if not exists*/ test.m1 (
 id UInt16
,name String
) ENGINE = Memory
;
-- 插入测试数据
insert into test.m1 (id, name) values (1, 'abc'), (2, 'bbbb');

-- 查询
select * from test.m1;

默认值缓存

默认值的处理方面， ClickHouse 中，默认值老是有的，若是没有显示式指定的话，会按字段类型处理：服务器

数字类型， 0
字符串，空字符串
数组，空数组
日期， 0000-00-00
时间， 0000-00-00 00:00:00
注：NULLs 是不支持的并发

数据类型异步

1.整型：UInt8,UInt16,UInt32,UInt64,Int8,Int16,Int32,Int64
范围U开头-2N/2~2N-1;非U开头0～2^N-1
2.枚举类型：Enum8,Enum16
Enum('hello'=1,'test'=-1),Enum是有符号的整型映射的，所以负数也是能够的
3.字符串型：FixedString(N),String
N是最大字节数，不是字符长度，若是是UTF8字符串，那么就会占3个字节，GBK会占2字节;String能够用来替换VARCHAR,BLOB,CLOB等数据类型
4.时间类型：Date
5.数组类型：Array(T)
T是一个基本类型，包括arry在内，官方不建议使用多维数组
6.元组：Tuple
7.结构：Nested(name1 Type1,name2 Type2,...)
相似一种map的结函数

物化列工具

指定 MATERIALIZED 表达式，即将一个列做为物化列处理了，这意味着这个列的值不能从insert 语句获取，只能是本身计算出来的。同时，
物化列也不会出如今 select * 的结果中：

drop table if exists test.m2;
create table test.m2 (
 a MATERIALIZED (b+1)
,b UInt16
) ENGINE = Memory;
insert into test.m2 (b) values (1);
select * from test.m2;
select a, b from test.m2;

表达式列

ALIAS 表达式列某方面跟物化列相同，就是它的值不能从 insert 语句获取。不一样的是，物化列是会真正保存数据（这样查询时不须要再计算），
而表达式列不会保存数据（这样查询时老是须要计算），只是在查询时返回表达式的结果。

create table test.m3 (a ALIAS (b+1), b UInt16) ENGINE = Memory;
insert into test.m3(b) values (1);
select * from test.m3;
select a, b from test.m3;

引擎/engine

引擎是clickhouse设计的精华部分

TinyLog

最简单的一种引擎，每一列保存为一个文件，里面的内容是压缩过的，不支持索引
这种引擎没有并发控制，因此，当你须要在读，又在写时，读会出错。并发写，内容都会坏掉。

应用场景:
a. 基本上就是那种只写一次
b. 而后就是只读的场景。
c. 不适用于处理量大的数据，官方推荐，使用这种引擎的表最多 100 万行的数据

drop table if exists test.tinylog;
create table test.tinylog (a UInt16, b UInt16) ENGINE = TinyLog;
insert into test.tinylog(a,b) values (7,13);

此时/var/lib/clickhouse/data/test/tinylog保存数据的目录结构：

├── a.bin
├── b.bin
└── sizes.json

a.bin 和 b.bin 是压缩过的对应的列的数据， sizes.json 中记录了每一个 *.bin 文件的大小

Log

这种引擎跟 TinyLog 基本一致
它的改进点，是加了一个 __marks.mrk 文件，里面记录了每一个数据块的偏移
这样作的一个用处，就是能够准确地切分读的范围，从而使用并发读取成为可能
可是，它是不能支持并发写的，一个写操做会阻塞其它读写操做
Log 不支持索引，同时由于有一个 __marks.mrk 的冗余数据，因此在写入数据时，一旦出现问题，这个表就废了

应用场景:
同 TinyLog 差很少，它适用的场景也是那种写一次以后，后面就是只读的场景，临时数据用它保存也能够

drop table if exists test.log;
create table test.log (a UInt16, b UInt16) ENGINE = Log;
insert into test.log(a,b) values (7,13);

此时/var/lib/clickhouse/data/test/log保存数据的目录结构：

├── __marks.mrk
├── a.bin
├── b.bin
└── sizes.json

Memory

内存引擎，数据以未压缩的原始形式直接保存在内存当中，服务器重启数据就会消失
能够并行读，读写互斥锁的时间也很是短
不支持索引，简单查询下有很是很是高的性能表现

应用场景:
a. 进行测试
b. 在须要很是高的性能，同时数据量又不太大（上限大概 1 亿行）的场景

Merge

一个工具引擎，自己不保存数据，只用于把指定库中的指定多个表链在一块儿。
这样，读取操做能够并发执行，同时也能够利用原表的索引，可是，此引擎不支持写操做
指定引擎的同时，须要指定要连接的库及表，库名可使用一个表达式，表名可使用正则表达式指定

create table test.tinylog1 (id UInt16, name String) ENGINE=TinyLog;
create table test.tinylog2 (id UInt16, name String) ENGINE=TinyLog;
create table test.tinylog3 (id UInt16, name String) ENGINE=TinyLog;

insert into test.tinylog1(id, name) values (1, 'tinylog1');
insert into test.tinylog2(id, name) values (2, 'tinylog2');
insert into test.tinylog3(id, name) values (3, 'tinylog3');

use test;
create table test.merge (id UInt16, name String) ENGINE=Merge(currentDatabase(), '^tinylog[0-9]+');
select _table,* from test.merge order by id desc

┌─_table───┬─id─┬─name─────┐
│ tinylog3 │ 3 │ tinylog3 │
│ tinylog2 │ 2 │ tinylog2 │
│ tinylog1 │ 1 │ tinylog1 │
└──────────┴────┴──────────┘

注：_table 这个列，是由于使用了 Merge 多出来的一个的一个虚拟列

a. 它表示原始数据的来源表，它不会出如今 show table 的结果当中
b. select * 不会包含它

Distributed

与 Merge 相似， Distributed 也是经过一个逻辑表，去访问各个物理表，设置引擎时的样子是：

Distributed(remote_group, database, table [, sharding_key])

其中：

remote_group /etc/clickhouse-server/config.xml中remote_servers参数
database 是各服务器中的库名
table 是表名
sharding_key 是一个寻址表达式，能够是一个列名，也能够是像 rand() 之类的函数调用，它与 remote_servers 中的 weight 共同做用，决定在写时往哪一个 shard 写

配置文件中的 remote_servers

<remote_servers>
   <log>
       <shard>
           <weight>1</weight>
           <internal_replication>false</internal_replication>
           <replica>
               <host>172.17.0.3</host>
               <port>9000</port>
           </replica>
       </shard>
       <shard>
           <weight>2</weight>
           <internal_replication>false</internal_replication>
           <replica>
               <host>172.17.0.4</host>
               <port>9000</port>
           </replica>
       </shard>
   </log>
</remote_servers>

log 是某个 shard 组的名字，就是上面的 remote_group 的值
shard 是固定标签
weight 是权重，前面说的 sharding_key 与这个有关。
简单来讲，上面的配置，理论上来看:
第一个 shard “被选中”的几率是 1 / (1 + 2) ，第二个是 2 / (1 + 2) ，这很容易理解。可是， sharding_key 的工做状况，是按实际数字的“命中区间”算的，即第一个的区间是 [0, 1) 的周期，第二个区间是 [1, 1+2) 的周期。好比把 sharding_key 设置成 id ，当 id=0 或 id=3 时，必定是写入到第一个 shard 中，若是把 sharding_key 设置成 rand() ，那系统会对应地本身做通常化转换吧，这种时候就是一种几率场景了。
internal_replication 是定义针对多个 replica 时的写入行为的。
若是为 false ，则会往全部的 replica 中写入数据，可是并不保证数据写入的一致性，因此这种状况时间一长，各 replica 的数据极可能出现差别。若是为 true ，则只会往第一个可写的 replica 中写入数据（剩下的事“物理表”本身处理）。
replica 就是定义各个冗余副本的，选项有 host ， port ， user ， password 等

看一个实际的例子，咱们先在两台机器上建立好物理表并插入一些测试数据：

create table test.tinylog_d1(id UInt16, name String) ENGINE=TinyLog;
insert into test.tinylog_d1(id, name) values (1, 'Distributed record 1');
insert into test.tinylog_d1(id, name) values (2, 'Distributed record 2');

在其中一台建立逻辑表：

create table test.tinylog_d (id UInt16, name String) ENGINE=Distributed(log, test,tinylog_d1 , id);

-- 插入数据到逻辑表，观察数据分发状况
insert into test.tinylog_d(id, name) values (0, 'main');
insert into test.tinylog_d(id, name) values (1, 'main');
insert into test.tinylog_d(id, name) values (2, 'main');

select name,sum(id),count(id) from test.tinylog_d group by name;

注：逻辑表中的写入操做是异步的，会先缓存在本机的文件系统上，而且，对于物理表的不可访问状态，并无严格控制，因此写入失败丢数据的状况是可能发生的

Null

空引擎，写入的任何数据都会被忽略，读取的结果必定是空。

可是注意，虽然数据自己不会被存储，可是结构上的和数据格式上的约束仍是跟普通表同样是存在的，同时，你也能够在这个引擎上建立视图

Buffer

1.Buffer 引擎，像是Memory 存储的一个上层应用似的（磁盘上也是没有相应目录的）
2.它的行为是一个缓冲区，写入的数据先被放在缓冲区，达到一个阈值后，这些数据会自动被写到指定的另外一个表中
3.和Memory 同样，有不少的限制，好比没有索引
4.Buffer 是接在其它表前面的一层，对它的读操做，也会自动应用到后面表，可是由于前面说到的限制的缘由，通常咱们读数据，就直接从源表读就行了，缓冲区的这点数据延迟，只要配置得当，影响不大的
5.Buffer 后面也能够不接任何表，这样的话，当数据达到阈值，就会被丢弃掉

一些特色：

若是一次写入的数据太大或太多，超过了 max 条件，则会直接写入源表。
删源表或改源表的时候，建议 Buffer 表删了重建。
“友好重启”时， Buffer 数据会先落到源表，“暴力重启”， Buffer 表中的数据会丢失。
即便使用了 Buffer ，屡次的小数据写入，对比一次大数据写入，也慢得多（几千行与百万行的差距）

-- 建立源表
create table test.mergetree (sdt  Date, id UInt16, name String, point UInt16) ENGINE=MergeTree(sdt, (id, name), 10);
-- 建立 Buffer表
-- Buffer(database, table, num_layers, min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)
create table test.mergetree_buffer as test.mergetree ENGINE=Buffer(test, mergetree, 16, 3, 20, 2, 10, 1, 10000);

insert into test.mergetree (sdt, id, name, point) values ('2017-07-10', 1, 'a', 20);
insert into test.mergetree_buffer (sdt, id, name, point) values ('2017-07-10', 1, 'b', 10);
select * from test.mergetree;
select '------';
select * from test.mergetree_buffer;

database 数据库
table 源表，这里除了字符串常量，也可使用变量的。
num_layers 是相似“分区”的概念，每一个分区的后面的 min / max 是独立计算的，官方推荐的值是 16 。
min / max 这组配置荐，就是设置阈值的，分别是时间（秒），行数，空间（字节）。

阈值的规则: 是“全部的 min 条件都知足，或至少一个 max 条件知足”。

若是按上面咱们的建表来讲，全部的 min 条件就是：过了 3秒，2条数据，1 Byte。一个 max 条件是：20秒，或 10 条数据，或有 10K

Set

Set 这个引擎有点特殊，由于它只用在 IN 操做符右侧，你不能对它 select

create table test.set(id UInt16, name String) ENGINE=Set;
insert into test.set(id, name) values (1, 'hello');
-- select 1 where (1, 'hello') in test.set; -- 默认UInt8 须要手动进行类型转换
select 1 where (toUInt16(1), 'hello') in test.set;

注: Set 引擎表，是全内存运行的，可是相关数据会落到磁盘上保存，启动时会加载到内存中。因此，意外中断或暴力重启，是可能产生数据丢失问题的

Join

TODO

MergeTree

这个引擎是 ClickHouse 的重头戏，它支持一个日期和一组主键的两层式索引，还能够实时更新数据。同时，索引的粒度能够自定义，外加直接支持采样功能

MergeTree(EventDate, (CounterID, EventDate), 8192)
MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID)), 8192)

EventDate 一个日期的列名
intHash32(UserID) 采样表达式
(CounterID, EventDate) 主键组（里面除了列名，也支持表达式），也能够是一个表达式
8192 主键索引的粒度

drop table if exists test.mergetree1;
create table test.mergetree1 (sdt  Date, id UInt16, name String, cnt UInt16) ENGINE=MergeTree(sdt, (id, name), 10);

-- 日期的格式，好像必须是 yyyy-mm-dd
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-01', 1, 'aaa', 10);
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-02', 4, 'bbb', 10);
insert into test.mergetree1(sdt, id, name, cnt) values ('2018-06-03', 5, 'ccc', 11);

此时/var/lib/clickhouse/data/test/mergetree1的目录结构：

├── 20180601_20180601_1_1_0
│   ├── checksums.txt
│   ├── columns.txt
│   ├── id.bin
│   ├── id.mrk
│   ├── name.bin
│   ├── name.mrk
│   ├── cnt.bin
│   ├── cnt.mrk 
│   ├── cnt.idx
│   ├── primary.idx
│   ├── sdt.bin
│   └── sdt.mrk -- 保存一下块偏移量
├── 20180602_20180602_2_2_0
│   └── ...
├── 20180603_20180603_3_3_0
│   └── ...
├── format_version.txt
└── detached

ReplacingMergeTree

1.在 MergeTree 的基础上，添加了“处理重复数据”的功能=>实时数据场景
2.相比 MergeTree ,ReplacingMergeTree 在最后加一个"版本列",它跟时间列配合一块儿，用以区分哪条数据是"新的"，并把旧的丢掉(这个过程是在 merge 时处理，不是数据写入时就处理了的，平时重复的数据仍是保存着的，而且查也是跟日常同样会查出来的)
3.主键列组用于区分重复的行

-- 版本列 容许的类型是， UInt 一族的整数，或 Date 或 DateTime
create table test.replacingmergetree (sdt  Date, id UInt16, name String, cnt UInt16) ENGINE=ReplacingMergeTree(sdt, (name), 10, cnt);

insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-10', 1, 'a', 20);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-10', 1, 'a', 30);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-11', 1, 'a', 20);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-11', 1, 'a', 30);
insert into test.replacingmergetree (sdt, id, name, cnt) values ('2018-06-11', 1, 'a', 10);

select * from test.replacingmergetree;

-- 若是记录未执行merge，能够手动触发一下 merge 行为
optimize table test.replacingmergetree;

┌────────sdt─┬─id─┬─name─┬─cnt─┐
│ 2018-06-11 │ 1 │ a │ 30 │
└────────────┴────┴──────┴─────┘

SummingMergeTree

1.SummingMergeTree 就是在 merge 阶段把数据sum求和
2.sum求和的列能够指定，不可加的未指定列，会取一个最早出现的值

create table test.summingmergetree (sdt Date, name String, a UInt16, b UInt16) ENGINE=SummingMergeTree(sdt, (sdt, name), 8192, (a));

insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-10', 'a', 1, 20);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-10', 'b', 2, 11);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-11', 'b', 3, 18);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-11', 'b', 3, 82);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-11', 'a', 3, 11);
insert into test.summingmergetree (sdt, name, a, b) values ('2018-06-12', 'c', 1, 35);

-- 手动触发一下 merge 行为
optimize table test.summingmergetree;

select * from test.summingmergetree;

┌────────sdt─┬─name─┬─a─┬──b─┐
│ 2018-06-10 │ a │ 1 │ 20 │
│ 2018-06-10 │ b │ 2 │ 11 │
│ 2018-06-11 │ a │ 3 │ 11 │
│ 2018-06-11 │ b │ 6 │ 18 │
│ 2018-06-12 │ c │ 1 │ 35 │
└────────────┴──────┴───┴────┘
注: 可加列不能是主键中的列，而且若是某行数据可加列都是 null ，则这行会被删除

AggregatingMergeTree

AggregatingMergeTree 是在 MergeTree 基础之上，针对聚合函数结果，做增量计算优化的一个设计，它会在 merge 时，针对主键预处理聚合的数据
应用于AggregatingMergeTree 上的聚合函数除了普通的 sum, uniq等，还有 sumState , uniqState ，及 sumMerge ， uniqMerge 这两组

1.聚合数据的预计算
是一种“空间换时间”的权衡，而且是以减小维度为代价的

dim1	dim2	dim3	measure1
aaaa	a	1	1
aaaa	b	2	1
bbbb	b	3	1
cccc	b	2	1
cccc	c	1	1
dddd	c	2	1
dddd	a	1	1

假设原始有三个维度，一个须要 count 的指标

dim1	dim2	dim3	measure1
aaaa	a	1	1
aaaa	b	2	1
bbbb	b	3	1
cccc	b	2	1
cccc	c	1	1
dddd	c	2	1
dddd	a	1	1

经过减小一个维度的方式，来以 count 函数聚合一次 M

dim2	dim3	count(measure1)
a	1	3
b	2	2
b	3	1
c	1	1
c	2	1

2.聚合数据的增量计算

对于 AggregatingMergeTree 引擎的表，不能使用普通的 INSERT 去添加数据，能够用：
a. INSERT SELECT 来插入数据
b. 更经常使用的，是能够建立一个物化视图

drop table if exists test.aggregatingmergetree;
create table test.aggregatingmergetree(
sdt Date
, dim1 String
, dim2 String
, dim3 String
, measure1 UInt64
) ENGINE=MergeTree(sdt, (sdt, dim1, dim2, dim3), 8192);

-- 建立一个物化视图，使用 AggregatingMergeTree
drop table if exists test.aggregatingmergetree_view;
create materialized view test.aggregatingmergetree_view
ENGINE = AggregatingMergeTree(sdt,(dim2, dim3), 8192)
as
select sdt,dim2, dim3, uniqState(dim1) as uv
from test.aggregatingmergetree
group by sdt,dim2, dim3;

insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'aaaa', 'a', '10', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'aaaa', 'a', '10', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'aaaa', 'b', '20', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'bbbb', 'b', '30', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'cccc', 'b', '20', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'cccc', 'c', '10', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'dddd', 'c', '20', 1);
insert into test.aggregatingmergetree (sdt, dim1, dim2, dim3, measure1) values ('2018-06-10', 'dddd', 'a', '10', 1);

-- 按 dim2 和 dim3 聚合 count(measure1)
select dim2, dim3, count(measure1) from test.aggregatingmergetree group by dim2, dim3;

-- 按 dim2 聚合 UV
select dim2, uniq(dim1) from test.aggregatingmergetree group by dim2;

-- 手动触发merge
OPTIMIZE TABLE test.aggregatingmergetree_view;
select * from test.aggregatingmergetree_view;

-- 查 dim2 的 uv
select dim2, uniqMerge(uv) from test.aggregatingmergetree_view group by dim2 order by dim2;

CollapsingMergeTree

是专门为 OLAP 场景下，一种“变通”存数作法而设计的，在数据是不能改，更不能删的前提下，经过“运算”的方式，去抹掉旧数据的影响，把旧数据“减”去便可，从而解决"最终状态"类的问题，好比 当前有多少人在线？

“以加代删”的增量存储方式，带来了聚合计算方便的好处，代价倒是存储空间的翻倍，而且，对于只关心最新状态的场景，中间数据都是无用的

CollapsingMergeTree 在建立时与 MergeTree 基本同样，除了最后多了一个参数，须要指定 Sign 位（必须是 Int8 类型）

create table test.collapsingmergetree(sign Int8, sdt Date, name String, cnt UInt16) ENGINE=CollapsingMergeTree(sdt, (sdt, name), 8192, sign);

做者：darebeat 连接：https://www.jianshu.com/p/a5bf490247ea 来源：简书简书著做权归做者全部，任何形式的转载都请联系做者得到受权并注明出处。