Clickhouse 入门

时间 2020-11-30

标签 php html java mysql linux sql 数据库 apache api 性能优化栏目 PHP 繁體版

原文原文链接

clickhouse 简介
ck是一个列式存储的数据库，其针对的场景是OLAP。OLAP的特色是：php

数据不常常写，即使写也是批量写。不像OLTP是一条一条写
大多数是读请求
查询并发较少，不适合放置先生高并发业务场景使用 , CK自己建议最大一秒100个并发查询。
不要求事务

click的优势

为了加强压缩比例，ck存储的一列长度固，因而存储的时候，不用在存储该列的长度信息html

使用向量引擎 , vector engine ，什么是向量引擎？
https://www.infoq.cn/article/columnar-databases-and-vectorization/?itm_source=infoq_en&itm_medium=link_on_en_item&itm_campaign=item_in_other_langsjava

clickhouse的缺点

不能完整支持事务
不能很高吞吐量的修改或删除数据
因为索引的稀疏性，不适合基于key来查询单个记录

性能优化

为了提升插入性能，最好批量插入，最少批次是1000行记录。且使用并发插入能显著提升插入速度。mysql

访问接口

ck像es同样暴露两个端口，一个tcp的，一个http的。tcp默认端口：9000 ,http默认端口：8123。通常咱们并不直接经过这些端口与ck交互，而是使用一些客户端，这些客户端能够是：linux

Command-line Client 经过它能够连接ck,而后进行基本的crud操做，还能够导入数据到ck 。它使用tcp端口连接ck
http interface : 能像es同样，经过rest方式，按照ck本身的语法，提交crud
jdbc driver
odbc driver

输入输出格式

ck可以读写多种格式作为输入(即insert)，也能在输出时(即select )吐出指定的格式。sql

好比插入数据时，指定数据源的格式为JSONEachRow数据库

INSERT INTO UserActivity FORMAT JSONEachRow {"PageViews":5, "UserID":"4324182021466249494", "Duration":146,"Sign":-1} {"UserID":"4324182021466249494","PageViews":6,"Duration":185,"Sign":1}

读取数据时，指定格式为JSONEachRowapache

SELECT * FROM UserActivity FORMAT JSONEachRow

值得注意的时指定这些格式应该是ck解析或生成的格式，并非ck最终的的存储格式，ck应该仍是按本身的列式格式进行存储。ck支持多种格式，具体看文档
https://clickhouse.yandex/docs/en/interfaces/formats/#nativeapi

数据库引擎

ck支持在其中ck中建立一个数据库，但数据库的实际存储是Mysql，这样就能够经过ck对该库中表的数据进行crud, 有点像hive中的外表，只是这里外挂的是整个数据库。性能优化

假设mysql中有如下数据

mysql> USE test;
Database changed

mysql> CREATE TABLE `mysql_table` (
    ->   `int_id` INT NOT NULL AUTO_INCREMENT,
    ->   `float` FLOAT NOT NULL,
    ->   PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)

mysql> insert into mysql_table (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)

mysql> select * from mysql_table;
+--------+-------+
| int_id | value |
+--------+-------+
|      1 |     2 |
+--------+-------+
1 row in set (0,00 sec)

在ck中建立数据库，连接上述mysql

CREATE DATABASE mysql_db ENGINE = MySQL('localhost:3306', 'test', 'my_user', 'user_password')

而后就能够在ck中，对mysql库进行一系列操做

表引擎(table engine)—MergeTree 家族

表引擎定义一个表建立是时候，使用什么引擎进行存储。表引擎控制以下事项

数据如何读写以及，以及存储位置
支持的查询能力
数据并发访问能力
数据的replica特征

MergeTree 引擎

建表时，指定table engine相关配置

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

该引擎会数据进行分区存储。
数据插入时，不一样分区的数据，会分为不一样的数据段(data part), ck后台再对这些data part作合并，不一样的分区的data part不会合到一块儿
一个data part 由有许多不可分割的最小granule组成

部分配置举例

ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192

granule

gruanule是按主键排序后，紧邻在一块儿，不可再分割的数据集。每一个granule 的第一行数据的主键做为这个数据做为这个数据集的mark 。好比这里的主键是(CounterID, Date)。第一个granule排序的第一列数据，其主键为a,1 ,能够看到多一个gruanle中的多行数据，其主键能够相同。

同时为了方便索引，ck会对每一个granule指定一个mark number, 方便实际使用的（经过编号，总比经过实际的主键值要好使用一点）。

这种索引结构很是像跳表。也称为稀疏索引，由于它不是对每一行数据作索引，而是以排序后的数据范围作索引。

查询举例，若是咱们想查询CounterID in ('a', 'h')，ck服务器基于上述结构，实际读取的数据范围为[0, 3) and [6, 8)

能够在建表时，经过index_granularity指定，两个mark之间存储的行记录数，也即granule的大小(由于两个mark间就是一个granule)

TTL

能够对表和字段进行过时设置

MergeTree 总结

MergeTree 至关于MergeTree家族表引擎的超类。它定义整个MergeTree家族的数据文件存储的特征。即

有数据合并
有稀疏索引，像跳表同样的数据结构，来存储数据集。
能够指定数据分区

而在此数据基础上，衍生出了一些列增对不一样应用场景的子MergeTree。他们分别是

ReplacingMergeTree 自动移除primary key相同的数据
SummingMergeTree　可以将相同主键的，数字类型字段进行sum,　最后存为一行，这至关于预聚合，它能减小存储空间，提高查询性能
AggregatingMergeTree　可以将同一主键的数据，按必定规则聚合，减小数据存储，提升聚合查询的性能，至关于预聚合。
CollapsingMergeTree　将大多数列内容都相同，可是部分列值不一样，可是数据是成对的行合并，好比列的值是1和-1

ReplicatedMergeTree　引擎

ck中建立的表，默认都是没有replicate的，为了提升可用性，须要引入replicate。ck的引入方式是经过集成zookeeper实现数据的replicate副本。

正对上述的各类预聚合引擎，也有对应的ReplicatedMergeTree 引擎进行支持

ReplicatedMergeTree
ReplicatedSummingMergeTree
ReplicatedReplacingMergeTree
ReplicatedAggregatingMergeTree
ReplicatedCollapsingMergeTree
ReplicatedVersionedCollapsingMergeTree
ReplicatedGraphiteMergeTree

表引擎(table engine)— Log Engine 家族

该系列表引擎正对的是那种会持续产生须要小表，而且各个表数据量都不大的日志场景。这些引擎的特色是：

数据存储在磁盘上
以apeend方式新增数据
写是加锁，读需等待，也即查询性能不高

表引擎(table engine)— 外部数据源

ck建表时，还支持许多外部数据源引擎，他们应该是像hive　外表同样，只是创建了一个表形态的连接，实际存储仍是源数据源。(这个有待确认)

这些外部数据源表引擎有：

Kafka
MySQL
JDBC
ODBC
HDFS

Sql语法

sample 语句

在建表的时候，能够指定基于某个列的散列值作sample (之因此hash散列，是为了保证抽样的均匀和随机).这样咱们在查询的时候，能够不用对全表数据作处理，而是基于sample抽样一部分数据，进行结构计算就像。好比全表有100我的，若是要计算这一百我的的总成绩，可使用sample取十我的，将其成绩求和后，乘以10。sample适用于那些不须要精确计算，而且对计算耗时很是敏感的业务场景。

安装事宜

一些tips

生产环境关掉swap file

Disable the swap file for production environments.

记录集群运行状况的一些表

system.metrics, system.events, and system.asynchronous_metrics tables.

安装环境配置

cpu频率控制

Linux系统，会根据任务的负荷对cpu进行降频或升频，这些调度升降过程会影响到ck的性能，使用如下配置，将cpu的频率开到最大

echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

linux系统频率可能的配置以下：

运行超额分配内存

基于swap 磁盘机制，Linux系统能够支持应用系统对超过物理内存实际大小的，内存申请，基本原理是将一部分的不用的数据，swap到硬盘，腾出空间给正在用的数据，这样对上层应用来看，仿佛拥有了很大的内存量，这种容许超额申请内存的行为叫：Overcommiting Memory

控制Overcommiting Memory行为的有三个数值

0: The Linux kernel is free to overcommit memory (this is the default), a heuristic algorithm is applied to figure out if enough memory is available.
1: The Linux kernel will always overcommit memory, and never check if enough memory is available. This increases the risk of out-of-memory situations, but also improves memory-intensive workloads.
2: The Linux kernel will not overcommit memory, and only allocate as much memory as defined in overcommit_ratio.

ck须要尽量多的内存，因此须要开启超额申请的功能，修改配置以下

echo 0 | sudo tee /proc/sys/vm/overcommit_memory

关闭透明内存

Huge Pages 操做系统为了提速处理，将部分应用内存页放到了处理器中，这个页叫hug pages。而为了透明化这一过程，linux启用了khugepaged内核线程来专门负责此事，这种透明自动化的方式叫： transparent hugepages 。但自动化的方式会带来内存泄露的风险，具体缘由看参考连接。

因此CK安装指望关闭该选项：

echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

尽可能用大的网络带宽

若是是ipv6的话，须要增大 route cache

不要将zk和ck装在一块儿

ck会尽量的多占用资源来保证性能，因此若是跟zk装在一块儿，ck会影响zk,使其吞吐量降低，延迟增高

开启zk日志清理功能

zk默认不会删除过时的snapshot和log文件，日积月累将是个定时炸弹，因此须要修改zk配置，启用autopurge功能，yandex的配置以下:

zk配置zoo.cfg

# http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=30000
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10

maxClientCnxns=2000

maxSessionTimeout=60000000
# the directory where the snapshot is stored.
dataDir=/opt/zookeeper/{{ cluster['name'] }}/data
# Place the dataLogDir to a separate physical disc for better performance
dataLogDir=/opt/zookeeper/{{ cluster['name'] }}/logs

autopurge.snapRetainCount=10
autopurge.purgeInterval=1


# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).
preAllocSize=131072

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.
snapCount=3000000

# If this option is defined, requests will be will logged to a trace file named
# traceFile.year.month.day.
#traceFile=

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.
leaderServes=yes

standaloneEnabled=false
dynamicConfigFile=/etc/zookeeper-{{ cluster['name'] }}/conf/zoo.cfg.dynamic

对应的jvm参数

NAME=zookeeper-{{ cluster['name'] }}
ZOOCFGDIR=/etc/$NAME/conf

# TODO this is really ugly
# How to find out, which jars are needed?
# seems, that log4j requires the log4j.properties file to be in the classpath
CLASSPATH="$ZOOCFGDIR:/usr/build/classes:/usr/build/lib/*.jar:/usr/share/zookeeper/zookeeper-3.5.1-metrika.jar:/usr/share/zookeeper/slf4j-log4j12-1.7.5.jar:/usr/share/zookeeper/slf4j-api-1.7.5.jar:/usr/share/zookeeper/servlet-api-2.5-20081211.jar:/usr/share/zookeeper/netty-3.7.0.Final.jar:/usr/share/zookeeper/log4j-1.2.16.jar:/usr/share/zookeeper/jline-2.11.jar:/usr/share/zookeeper/jetty-util-6.1.26.jar:/usr/share/zookeeper/jetty-6.1.26.jar:/usr/share/zookeeper/javacc.jar:/usr/share/zookeeper/jackson-mapper-asl-1.9.11.jar:/usr/share/zookeeper/jackson-core-asl-1.9.11.jar:/usr/share/zookeeper/commons-cli-1.2.jar:/usr/src/java/lib/*.jar:/usr/etc/zookeeper"

ZOOCFG="$ZOOCFGDIR/zoo.cfg"
ZOO_LOG_DIR=/var/log/$NAME
USER=zookeeper
GROUP=zookeeper
PIDDIR=/var/run/$NAME
PIDFILE=$PIDDIR/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
JAVA=/usr/bin/java
ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
JMXLOCALONLY=false
JAVA_OPTS="-Xms{{ cluster.get('xms','128M') }} \
    -Xmx{{ cluster.get('xmx','1G') }} \
    -Xloggc:/var/log/$NAME/zookeeper-gc.log \
    -XX:+UseGCLogFileRotation \
    -XX:NumberOfGCLogFiles=16 \
    -XX:GCLogFileSize=16M \
    -verbose:gc \
    -XX:+PrintGCTimeStamps \
    -XX:+PrintGCDateStamps \
    -XX:+PrintGCDetails
    -XX:+PrintTenuringDistribution \
    -XX:+PrintGCApplicationStoppedTime \
    -XX:+PrintGCApplicationConcurrentTime \
    -XX:+PrintSafepointStatistics \
    -XX:+UseParNewGC \
    -XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled"

数据备份

数据除了存储在ck以外，能够在hdfs中保留一份，以防止ck数据丢失后，没法恢复。

配置文件

ck的默认配置文件为/etc/clickhouse-server/config.xml，你能够在其中指定全部的服务器配置。

固然你能够将各类不一样的配置分开，好比user的配置，和quota的配置，单独放一个文件，其他文件放置的路径为

/etc/clickhouse-server/config.d

ck最终会将全部的配置合在一块儿生成一个完整的配置file-preprocessed.xml

各个分开的配置，能够覆盖或删除主配置中的相同配置，使用replace或remove属性就行，好比

<query_masking_rules>
    <rule>
        <name>hide SSN</name>
        <regexp>\b\d{3}-\d{2}-\d{4}\b</regexp>
        <replace>000-00-0000</replace>
    </rule>
</query_masking_rules>

同时ck还可使用zk作为本身的配置源，即最终配置文件的生成，会使用zk中的配置。

默认状况下：
users, access rights, profiles of settings, quotas这些设置都在users.xml

一些最佳实践

一些最佳配置实践：
1.写入时，不要使用distribution 表，怕出现数据不一致
2.设置background_pool_size ，提高Merge的速度，由于merge线程就是使用这个线程池
3.设置max_memory_usage和max_memory_usage_for_all_queries，限制ck使用物理内存的大小，由于使用内存过大，操做系统会将ck进程杀死
4.设置max_bytes_before_external_sort和max_bytes_before_external_group_by，来使得聚合的sort和group在须要大内存且内存超过上述限制时，不至于失败，能够转而使用硬盘进行处理

clickhouse 简介

ck是一个列式存储的数据库，其针对的场景是OLAP。OLAP的特色是：

数据不常常写，即使写也是批量写。不像OLTP是一条一条写
大多数是读请求
查询并发较少，不适合放置先生高并发业务场景使用 , CK自己建议最大一秒100个并发查询。
不要求事务

click的优势

为了加强压缩比例，ck存储的一列长度固，因而存储的时候，不用在存储该列的长度信息

clickhouse的缺点

不能完整支持事务
不能很高吞吐量的修改或删除数据
因为索引的稀疏性，不适合基于key来查询单个记录

性能优化

为了提升插入性能，最好批量插入，最少批次是1000行记录。且使用并发插入能显著提升插入速度。

访问接口

ck像es同样暴露两个端口，一个tcp的，一个http的。tcp默认端口：9000 ,http默认端口：8123。通常咱们并不直接经过这些端口与ck交互，而是使用一些客户端，这些客户端能够是：

Command-line Client 经过它能够连接ck,而后进行基本的crud操做，还能够导入数据到ck 。它使用tcp端口连接ck
http interface : 能像es同样，经过rest方式，按照ck本身的语法，提交crud
jdbc driver
odbc driver

输入输出格式

ck可以读写多种格式作为输入(即insert)，也能在输出时(即select )吐出指定的格式。

好比插入数据时，指定数据源的格式为JSONEachRow

INSERT INTO UserActivity FORMAT JSONEachRow {"PageViews":5, "UserID":"4324182021466249494", "Duration":146,"Sign":-1} {"UserID":"4324182021466249494","PageViews":6,"Duration":185,"Sign":1}

读取数据时，指定格式为JSONEachRow

SELECT * FROM UserActivity FORMAT JSONEachRow

值得注意的时指定这些格式应该是ck解析或生成的格式，并非ck最终的的存储格式，ck应该仍是按本身的列式格式进行存储。ck支持多种格式，具体看文档
https://clickhouse.yandex/docs/en/interfaces/formats/#native

数据库引擎

ck支持在其中ck中建立一个数据库，但数据库的实际存储是Mysql，这样就能够经过ck对该库中表的数据进行crud, 有点像hive中的外表，只是这里外挂的是整个数据库。

假设mysql中有如下数据

mysql> USE test;
Database changed

mysql> CREATE TABLE `mysql_table` (
    ->   `int_id` INT NOT NULL AUTO_INCREMENT,
    ->   `float` FLOAT NOT NULL,
    ->   PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)

mysql> insert into mysql_table (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)

mysql> select * from mysql_table;
+--------+-------+
| int_id | value |
+--------+-------+
|      1 |     2 |
+--------+-------+
1 row in set (0,00 sec)

在ck中建立数据库，连接上述mysql

CREATE DATABASE mysql_db ENGINE = MySQL('localhost:3306', 'test', 'my_user', 'user_password')

而后就能够在ck中，对mysql库进行一系列操做

表引擎(table engine)—MergeTree 家族

表引擎定义一个表建立是时候，使用什么引擎进行存储。表引擎控制以下事项

数据如何读写以及，以及存储位置
支持的查询能力
数据并发访问能力
数据的replica特征

MergeTree 引擎

建表时，指定table engine相关配置

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

该引擎会数据进行分区存储。
数据插入时，不一样分区的数据，会分为不一样的数据段(data part), ck后台再对这些data part作合并，不一样的分区的data part不会合到一块儿
一个data part 由有许多不可分割的最小granule组成

部分配置举例

ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity=8192

granule

同时为了方便索引，ck会对每一个granule指定一个mark number, 方便实际使用的（经过编号，总比经过实际的主键值要好使用一点）。

这种索引结构很是像跳表。也称为稀疏索引，由于它不是对每一行数据作索引，而是以排序后的数据范围作索引。

查询举例，若是咱们想查询CounterID in ('a', 'h')，ck服务器基于上述结构，实际读取的数据范围为[0, 3) and [6, 8)

能够在建表时，经过index_granularity指定，两个mark之间存储的行记录数，也即granule的大小(由于两个mark间就是一个granule)

TTL

能够对表和字段进行过时设置

MergeTree 总结

MergeTree 至关于MergeTree家族表引擎的超类。它定义整个MergeTree家族的数据文件存储的特征。即

有数据合并
有稀疏索引，像跳表同样的数据结构，来存储数据集。
能够指定数据分区

而在此数据基础上，衍生出了一些列增对不一样应用场景的子MergeTree。他们分别是

ReplacingMergeTree 自动移除primary key相同的数据
SummingMergeTree　可以将相同主键的，数字类型字段进行sum,　最后存为一行，这至关于预聚合，它能减小存储空间，提高查询性能
AggregatingMergeTree　可以将同一主键的数据，按必定规则聚合，减小数据存储，提升聚合查询的性能，至关于预聚合。
CollapsingMergeTree　将大多数列内容都相同，可是部分列值不一样，可是数据是成对的行合并，好比列的值是1和-1

ReplicatedMergeTree　引擎

ck中建立的表，默认都是没有replicate的，为了提升可用性，须要引入replicate。ck的引入方式是经过集成zookeeper实现数据的replicate副本。

正对上述的各类预聚合引擎，也有对应的ReplicatedMergeTree 引擎进行支持

ReplicatedMergeTree
ReplicatedSummingMergeTree
ReplicatedReplacingMergeTree
ReplicatedAggregatingMergeTree
ReplicatedCollapsingMergeTree
ReplicatedVersionedCollapsingMergeTree
ReplicatedGraphiteMergeTree

表引擎(table engine)— Log Engine 家族

该系列表引擎正对的是那种会持续产生须要小表，而且各个表数据量都不大的日志场景。这些引擎的特色是：

数据存储在磁盘上
以apeend方式新增数据
写是加锁，读需等待，也即查询性能不高

表引擎(table engine)— 外部数据源

ck建表时，还支持许多外部数据源引擎，他们应该是像hive　外表同样，只是创建了一个表形态的连接，实际存储仍是源数据源。(这个有待确认)

这些外部数据源表引擎有：

Kafka
MySQL
JDBC
ODBC
HDFS

Sql语法

sample 语句

安装事宜

一些tips

生产环境关掉swap file

Disable the swap file for production environments.

记录集群运行状况的一些表

system.metrics, system.events, and system.asynchronous_metrics tables.

安装环境配置

cpu频率控制

Linux系统，会根据任务的负荷对cpu进行降频或升频，这些调度升降过程会影响到ck的性能，使用如下配置，将cpu的频率开到最大

echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

linux系统频率可能的配置以下：

运行超额分配内存

控制Overcommiting Memory行为的有三个数值

0: The Linux kernel is free to overcommit memory (this is the default), a heuristic algorithm is applied to figure out if enough memory is available.
1: The Linux kernel will always overcommit memory, and never check if enough memory is available. This increases the risk of out-of-memory situations, but also improves memory-intensive workloads.
2: The Linux kernel will not overcommit memory, and only allocate as much memory as defined in overcommit_ratio.

ck须要尽量多的内存，因此须要开启超额申请的功能，修改配置以下

echo 0 | sudo tee /proc/sys/vm/overcommit_memory

关闭透明内存

因此CK安装指望关闭该选项：

echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

尽可能用大的网络带宽

若是是ipv6的话，须要增大 route cache

不要将zk和ck装在一块儿

ck会尽量的多占用资源来保证性能，因此若是跟zk装在一块儿，ck会影响zk,使其吞吐量降低，延迟增高

开启zk日志清理功能

zk默认不会删除过时的snapshot和log文件，日积月累将是个定时炸弹，因此须要修改zk配置，启用autopurge功能，yandex的配置以下:

zk配置zoo.cfg

# http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=30000
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10

maxClientCnxns=2000

maxSessionTimeout=60000000
# the directory where the snapshot is stored.
dataDir=/opt/zookeeper/{{ cluster['name'] }}/data
# Place the dataLogDir to a separate physical disc for better performance
dataLogDir=/opt/zookeeper/{{ cluster['name'] }}/logs

autopurge.snapRetainCount=10
autopurge.purgeInterval=1


# To avoid seeks ZooKeeper allocates space in the transaction log file in
# blocks of preAllocSize kilobytes. The default block size is 64M. One reason
# for changing the size of the blocks is to reduce the block size if snapshots
# are taken more often. (Also, see snapCount).
preAllocSize=131072

# Clients can submit requests faster than ZooKeeper can process them,
# especially if there are a lot of clients. To prevent ZooKeeper from running
# out of memory due to queued requests, ZooKeeper will throttle clients so that
# there is no more than globalOutstandingLimit outstanding requests in the
# system. The default limit is 1,000.ZooKeeper logs transactions to a
# transaction log. After snapCount transactions are written to a log file a
# snapshot is started and a new transaction log file is started. The default
# snapCount is 10,000.
snapCount=3000000

# If this option is defined, requests will be will logged to a trace file named
# traceFile.year.month.day.
#traceFile=

# Leader accepts client connections. Default value is "yes". The leader machine
# coordinates updates. For higher update throughput at thes slight expense of
# read throughput the leader can be configured to not accept clients and focus
# on coordination.
leaderServes=yes

standaloneEnabled=false
dynamicConfigFile=/etc/zookeeper-{{ cluster['name'] }}/conf/zoo.cfg.dynamic

对应的jvm参数

NAME=zookeeper-{{ cluster['name'] }}
ZOOCFGDIR=/etc/$NAME/conf

# TODO this is really ugly
# How to find out, which jars are needed?
# seems, that log4j requires the log4j.properties file to be in the classpath
CLASSPATH="$ZOOCFGDIR:/usr/build/classes:/usr/build/lib/*.jar:/usr/share/zookeeper/zookeeper-3.5.1-metrika.jar:/usr/share/zookeeper/slf4j-log4j12-1.7.5.jar:/usr/share/zookeeper/slf4j-api-1.7.5.jar:/usr/share/zookeeper/servlet-api-2.5-20081211.jar:/usr/share/zookeeper/netty-3.7.0.Final.jar:/usr/share/zookeeper/log4j-1.2.16.jar:/usr/share/zookeeper/jline-2.11.jar:/usr/share/zookeeper/jetty-util-6.1.26.jar:/usr/share/zookeeper/jetty-6.1.26.jar:/usr/share/zookeeper/javacc.jar:/usr/share/zookeeper/jackson-mapper-asl-1.9.11.jar:/usr/share/zookeeper/jackson-core-asl-1.9.11.jar:/usr/share/zookeeper/commons-cli-1.2.jar:/usr/src/java/lib/*.jar:/usr/etc/zookeeper"

ZOOCFG="$ZOOCFGDIR/zoo.cfg"
ZOO_LOG_DIR=/var/log/$NAME
USER=zookeeper
GROUP=zookeeper
PIDDIR=/var/run/$NAME
PIDFILE=$PIDDIR/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
JAVA=/usr/bin/java
ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
JMXLOCALONLY=false
JAVA_OPTS="-Xms{{ cluster.get('xms','128M') }} \
    -Xmx{{ cluster.get('xmx','1G') }} \
    -Xloggc:/var/log/$NAME/zookeeper-gc.log \
    -XX:+UseGCLogFileRotation \
    -XX:NumberOfGCLogFiles=16 \
    -XX:GCLogFileSize=16M \
    -verbose:gc \
    -XX:+PrintGCTimeStamps \
    -XX:+PrintGCDateStamps \
    -XX:+PrintGCDetails
    -XX:+PrintTenuringDistribution \
    -XX:+PrintGCApplicationStoppedTime \
    -XX:+PrintGCApplicationConcurrentTime \
    -XX:+PrintSafepointStatistics \
    -XX:+UseParNewGC \
    -XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled"

数据备份

数据除了存储在ck以外，能够在hdfs中保留一份，以防止ck数据丢失后，没法恢复。

配置文件

ck的默认配置文件为/etc/clickhouse-server/config.xml，你能够在其中指定全部的服务器配置。

固然你能够将各类不一样的配置分开，好比user的配置，和quota的配置，单独放一个文件，其他文件放置的路径为

/etc/clickhouse-server/config.d

ck最终会将全部的配置合在一块儿生成一个完整的配置file-preprocessed.xml

各个分开的配置，能够覆盖或删除主配置中的相同配置，使用replace或remove属性就行，好比

<query_masking_rules>
    <rule>
        <name>hide SSN</name>
        <regexp>\b\d{3}-\d{2}-\d{4}\b</regexp>
        <replace>000-00-0000</replace>
    </rule>
</query_masking_rules>

同时ck还可使用zk作为本身的配置源，即最终配置文件的生成，会使用zk中的配置。

默认状况下：
users, access rights, profiles of settings, quotas这些设置都在users.xml

一些最佳实践

一些踩坑处理：
1.Too many parts(304). Merges are processing significantly slower than inserts 问题是由于插入的太平凡，插入速度超过了后台merge的速度，解决版本办法是，增大background_pool_size和下降插入速度，官方建议“每秒不超过1次的insert request”，实际是每秒的写入影响不要超过一个文件。若是写入的数据涉及多个分区文件，极可能仍是出现这个问题。因此分区的设置必定要合理
2.DB::NetException: Connection reset by peer, while reading from socket xxx 。颇有多是没有配置max_memory_usage和max_memory_usage_for_all_queries，致使内存超限，ck server被操做系统杀死
3.Memory limit (for query) exceeded:would use 9.37 GiB (attempt to allocate chunk of 301989888 bytes), maximum: 9.31 GiB 。是因为咱们设置了ck server的内存使用上线。那些超限的请求被ck杀死，但ck自己并无挂。这个时候就要增长max_bytes_before_external_sort和max_bytes_before_external_group_by配置，来利用上硬盘
4.ck的副本和分片依赖zk,因此zk是个很大的性能瓶颈，须要对zk有很好的认识和配置，甚至启用多个zk集群来支持ck集群
5.zk和ck建议都使用ssd,提高性能
对应文章：https://mp.weixin.qq.com/s/egzFxUOAGen_yrKclZGVag

参考资料

https://clickhouse.yandex/docs/en/operations/tips/

http://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

https://blog.nelhage.com/post/transparent-hugepages/

https://wiki.archlinux.org/index.php/CPU_frequency_scaling

参考资料

https://clickhouse.yandex/docs/en/operations/tips/

http://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

https://blog.nelhage.com/post/transparent-hugepages/

https://wiki.archlinux.org/index.php/CPU_frequency_scaling

欢迎关注个人我的公众号"西北偏北UP"，记录代码人生，行业思考，科技评论