重叠泪痕缄锦字,人生只有情难死。
在上一章咱们已经完成ClickHouse
分布式集群安装,也建立本地表和分布式表进行了测试,可是,假如停掉一个节点会发生神马状况?java
node03
上kill
掉clickhouse-server
进程[root@node03 ~]# ps -ef | grep clickhouse clickho+ 2233 1 73 13:07 ? 00:00:02 clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server.pid --config-file=/etc/clickhouse-server/config.xml root 2306 1751 0 13:07 pts/0 00:00:00 grep --color=auto clickhouse [root@node03 ~]# service clickhouse-server stop Stop clickhouse-server service: DONE [root@node03 ~]# ps -ef | grep clickhouse root 2337 1751 0 13:07 pts/0 00:00:00 grep --color=auto clickhouse
node01
上查询分布式表node01 :) select * from cluster3s1r_all; # node03没有被杀掉时 SELECT * FROM cluster3s1r_all ┌─id─┬─website─────────────────┬─wechat─────┬─FlightDate─┬─Year─┐ │ 2 │ http://www.merryyou.cn/ │ javaganhuo │ 2020-11-28 │ 2020 │ └────┴─────────────────────────┴────────────┴────────────┴──────┘ ┌─id─┬─website───────────────┬─wechat───┬─FlightDate─┬─Year─┐ │ 1 │ https://niocoder.com/ │ java干货 │ 2020-11-28 │ 2020 │ └────┴───────────────────────┴──────────┴────────────┴──────┘ ┌─id─┬─website──────────────┬─wechat─┬─FlightDate─┬─Year─┐ │ 3 │ http://www.xxxxx.cn/ │ xxxxx │ 2020-11-28 │ 2020 │ └────┴──────────────────────┴────────┴────────────┴──────┘ 3 rows in set. Elapsed: 0.037 sec. node01 :) select * from cluster3s1r_all; # node03节点被杀掉时 SELECT * FROM cluster3s1r_all ┌─id─┬─website─────────────────┬─wechat─────┬─FlightDate─┬─Year─┐ │ 2 │ http://www.merryyou.cn/ │ javaganhuo │ 2020-11-28 │ 2020 │ └────┴─────────────────────────┴────────────┴────────────┴──────┘ ↘ Progress: 1.00 rows, 59.00 B (8.87 rows/s., 523.62 B/s.) 0% Received exception from server (version 20.8.3): Code: 279. DB::Exception: Received from localhost:9000. DB::Exception: All connection tries failed. Log: Code: 32, e.displayText() = DB::Exception: Attempt to read after eof (version 20.8.3.18) Code: 210, e.displayText() = DB::NetException: Connection refused (node03:9000) (version 20.8.3.18) Code: 210, e.displayText() = DB::NetException: Connection refused (node03:9000) (version 20.8.3.18) : While executing Remote. 1 rows in set. Elapsed: 0.114 sec.
只返回了node01
节点上的数据,node03
节点上的两条数据丢失。node
但在ClickHouse
中,replica
是挂在shard
上的,所以要用多副本,必须先定义shard
。mysql
最简单的状况:1个分片多个副本。web
metrika.xml
文件node01
上修改 /etc/clickhouse-server/metrika.xml
集群配置文件算法
<yandex> <!-- 集群配置 --> <clickhouse_remote_servers> <!-- 1分片2备份 --> <cluster_1shards_2replicas> <!-- 数据分片1 --> <shard> <!-- false表明一次性写入全部副本,true表示写入其中一个副本,配合zk来进行数据复制 --> <internal_replication>false</internal_replication> <replica> <host>node01</host> <port>9000</port> </replica> <replica> <host>node02</host> <port>9000</port> </replica> </shard> </cluster_1shards_2replicas> </clickhouse_remote_servers> </yandex>
将修改后的配置分发到node02
机器上sql
[root@node01 clickhouse-server]# scp /etc/clickhouse-server/metrika.xml node02:$PWD metrika.xml 100% 674 618.9KB/s 00:00
若是配置文件没有问题,是不用重启clickhouse-server
的,会自动加载配置文件,node01
上查看集群信息shell
[root@node01 clickhouse-server]# clickhouse-client -m ClickHouse client version 20.8.3.18. Connecting to localhost:9000 as user default. Connected to ClickHouse server version 20.8.3 revision 54438. node01 :) select * from system.clusters; SELECT * FROM system.clusters ┌─cluster───────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐ │ cluster_1shards_2replicas │ 1 │ 1 │ 1 │ node01 │ 192.168.10.100 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ cluster_1shards_2replicas │ 1 │ 1 │ 2 │ node02 │ 192.168.10.110 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards │ 1 │ 1 │ 1 │ 127.0.0.1 │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards │ 2 │ 1 │ 1 │ 127.0.0.2 │ 127.0.0.2 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards_localhost │ 2 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_shard_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_shard_localhost_secure │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9440 │ 0 │ default │ │ 0 │ 0 │ │ test_unavailable_shard │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_unavailable_shard │ 2 │ 1 │ 1 │ localhost │ ::1 │ 1 │ 0 │ default │ │ 0 │ 0 │ └───────────────────────────────────┴───────────┴──────────────┴─────────────┴───────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘ 10 rows in set. Elapsed: 0.018 sec.
在node01
和node02
上分别建立本地表cluster1s2r_local
数据库
CREATE TABLE default.cluster1s2r_local ( `id` Int32, `website` String, `wechat` String, `FlightDate` Date, Year UInt16 ) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);
在node01
机器上建立分布式表,注意集群名称vim
CREATE TABLE default.cluster1s2r_all AS cluster1s2r_local ENGINE = Distributed(cluster_1shards_2replicas, default, cluster1s2r_local, rand());
往分布式表cluster1s2r_all
插入数据,cluster1s2r_all
会所有插入到node01
和node02
节点的cluster1s2r_local
里微信
插入数据
INSERT INTO default.cluster1s2r_all (id,website,wechat,FlightDate,Year)values(1,'https://niocoder.com/','java干货','2020-11-28',2020); INSERT INTO default.cluster1s2r_all (id,website,wechat,FlightDate,Year)values(2,'http://www.merryyou.cn/','javaganhuo','2020-11-28',2020); INSERT INTO default.cluster1s2r_all (id,website,wechat,FlightDate,Year)values(3,'http://www.xxxxx.cn/','xxxxx','2020-11-28',2020);
查询分布式表和本地表
node01 :) select * from cluster1s2r_all; # 查询分布式表 SELECT * FROM cluster1s2r_all ┌─id─┬─website──────────────┬─wechat─┬─FlightDate─┬─Year─┐ │ 3 │ http://www.xxxxx.cn/ │ xxxxx │ 2020-11-28 │ 2020 │ └────┴──────────────────────┴────────┴────────────┴──────┘ ┌─id─┬─website─────────────────┬─wechat─────┬─FlightDate─┬─Year─┐ │ 2 │ http://www.merryyou.cn/ │ javaganhuo │ 2020-11-28 │ 2020 │ └────┴─────────────────────────┴────────────┴────────────┴──────┘ ┌─id─┬─website───────────────┬─wechat───┬─FlightDate─┬─Year─┐ │ 1 │ https://niocoder.com/ │ java干货 │ 2020-11-28 │ 2020 │ └────┴───────────────────────┴──────────┴────────────┴──────┘ 3 rows in set. Elapsed: 0.018 sec. node01 :) select * from cluster1s2r_local; # node01节点查询本地表 SELECT * FROM cluster1s2r_local ┌─id─┬─website─────────────────┬─wechat─────┬─FlightDate─┬─Year─┐ │ 2 │ http://www.merryyou.cn/ │ javaganhuo │ 2020-11-28 │ 2020 │ └────┴─────────────────────────┴────────────┴────────────┴──────┘ ┌─id─┬─website───────────────┬─wechat───┬─FlightDate─┬─Year─┐ │ 1 │ https://niocoder.com/ │ java干货 │ 2020-11-28 │ 2020 │ └────┴───────────────────────┴──────────┴────────────┴──────┘ ┌─id─┬─website──────────────┬─wechat─┬─FlightDate─┬─Year─┐ │ 3 │ http://www.xxxxx.cn/ │ xxxxx │ 2020-11-28 │ 2020 │ └────┴──────────────────────┴────────┴────────────┴──────┘ 3 rows in set. Elapsed: 0.015 sec. node02 :) select * from cluster1s2r_local; # node02节点查询本地表 SELECT * FROM cluster1s2r_local ┌─id─┬─website─────────────────┬─wechat─────┬─FlightDate─┬─Year─┐ │ 2 │ http://www.merryyou.cn/ │ javaganhuo │ 2020-11-28 │ 2020 │ └────┴─────────────────────────┴────────────┴────────────┴──────┘ ┌─id─┬─website───────────────┬─wechat───┬─FlightDate─┬─Year─┐ │ 1 │ https://niocoder.com/ │ java干货 │ 2020-11-28 │ 2020 │ └────┴───────────────────────┴──────────┴────────────┴──────┘ ┌─id─┬─website──────────────┬─wechat─┬─FlightDate─┬─Year─┐ │ 3 │ http://www.xxxxx.cn/ │ xxxxx │ 2020-11-28 │ 2020 │ └────┴──────────────────────┴────────┴────────────┴──────┘ 3 rows in set. Elapsed: 0.007 sec.
查询node01
和node02
本地表cluster1s2r_local
都是全量数据, 即便sotp
到其中一个节点数据也不会丢失,数据副本已经生效。
既然有多副本,就有个一致性的问题:加入写入数据时,挂掉一台机器,会怎样?
模拟写入分布式表是某一个节点down
机
node02
节点服务service clickhouse-server stop
node01
节点上向分布式表cluster1s2r_all
插入数据sql INSERT INTO default.cluster1s2r_all (id,website,wechat,FlightDate,Year)values(4,'http://www.yyyyyy.cn/','yyyyy','2020-11-29',2020);
node02
节点服务node01
和node02
机器的cluster1s2r_local
、以及cluster1s2r_all
,发现都是总数据量都增长了1条,说明这种状况下,集群节点之间可以自动同步上面是经过向分布式表cluster1s2r_all
插入数据,若是经过本地表cluster1s2r_local
,数据还能同步吗?
node01
上往cluster1s2r_local
插入1条数据;node02
,cluster1s2r_local
数据没有同步综上所述,经过分布表写入数据,会自动同步数据;而经过本地表表写入数据,不会同步;通常i状况下是没什么大问题。
可是生产状况总比理论复杂的多,以上配置可能会存在数据不一致的问题
官方文档描述以下:
Each shard can have the internal_replication parameter defined in the config file. If this parameter is set to true, the write operation selects the first healthy replica and writes data to it. Use this alternative if the Distributed table “looks at” replicated tables. In other words, if the table where data will be written is going to replicate them itself. If it is set to false (the default), data is written to all replicas. In essence, this means that the Distributed table replicates data itself. This is worse than using replicated tables, because the consistency of replicas is not checked, and over time they will contain slightly different data.
翻译以下:
分片可在配置文件中定义 ‘internal_replication’ 参数。 此参数设置为«true»时,写操做只选一个正常的副本写入数据。若是分布式表的子表是复制表(*ReplicaMergeTree),请使用此方案。换句话说,这实际上是把数据的复制工做交给实际须要写入数据的表自己而不是分布式表。 若此参数设置为«false»(默认值),写操做会将数据写入全部副本。实质上,这意味着要分布式表自己来复制数据。这种方式不如使用复制表的好,由于不会检查副本的一致性,而且随着时间的推移,副本数据可能会有些不同。
简单理解以下:
这个为true表明zk会挑选一个合适的节点写入,而后在后台进行多个节点之间数据的同步. 若是是false,则是一次性写入全部节点,以这种重复写入的方法实现节点之间数据的同步.
自动数据备份是表的行为,引擎为 ReplicatedXXX
的表支持自动同步。
Replicated
前缀只用于MergeTree
系列(MergeTree
是最经常使用的引擎)。
重点说明: Replicated
表自动同步与以前的集群自动同步不一样,是表的行为,与metrika.xml
中的<clickhouse_remote_servers>
配置没有关系,只要有zookeeper
配置就好了。
node01
修改metrika.xml
配置
<yandex> <zookeeper-servers> <node index="1"> <host>node01</host> <port>2181</port> </node> <node index="2"> <host>node02</host> <port>2181</port> </node> <node index="3"> <host>node03</host> <port>2181</port> </node> </zookeeper-servers> </yandex>
将修改后的配置分发到node02
机器上
[root@node01 clickhouse-server]# scp /etc/clickhouse-server/metrika.xml node02:$PWD metrika.xml 重启`clickhouse-server`,因为以前的表存在致使启动是失败。`error`日志 ```shell [root@node01 clickhouse-server]# tail -f /var/log/clickhouse-server/clickhouse-server.err.log 7. DB::StorageDistributed::startup() @ 0x10f1bd40 in /usr/bin/clickhouse 8. ? @ 0x1151d922 in /usr/bin/clickhouse 9. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0xa43d6ad in /usr/bin/clickhouse 10. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()::operator()() const @ 0xa43dd93 in /usr/bin/clickhouse 11. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa43cc4d in /usr/bin/clickhouse 12. ? @ 0xa43b3ff in /usr/bin/clickhouse 13. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so 14. clone @ 0xfe8dd in /usr/lib64/libc-2.17.so (version 20.8.3.18) 2020.11.29 14:43:01.163530 [ 3643 ] {} <Error> Application: DB::Exception: Requested cluster 'cluster_1shards_2replicas' not found: while loading database `default` from path /var/lib/clickhouse/metadata/default
删除以前的建表语句
[root@node01 default]# rm -rf /var/lib/clickhouse/metadata/default/*.sql
启动clickhouse-server
在node01
和node02
节点上建立数据库表
-- node01 节点 CREATE TABLE `cluster_zk` ( `id` Int32, `website` String, `wechat` String, `FlightDate` Date, Year UInt16 ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/cluster_zk', 'replica01', FlightDate, (Year, FlightDate), 8192); -- node02节点 CREATE TABLE `cluster_zk` ( `id` Int32, `website` String, `wechat` String, `FlightDate` Date, Year UInt16 ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/cluster_zk', 'replica02', FlightDate, (Year, FlightDate), 8192);
node01
节点上插入数据
INSERT INTO default.cluster_zk (id,website,wechat,FlightDate,Year)values(1,'https://niocoder.com/','java干货','2020-11-28',2020);
node01
,node02
节点上查询数据
node01 :) select * from cluster_zk; # node01节点 SELECT * FROM cluster_zk ┌─id─┬─website───────────────┬─wechat───┬─FlightDate─┬─Year─┐ │ 1 │ https://niocoder.com/ │ java干货 │ 2020-11-28 │ 2020 │ └────┴───────────────────────┴──────────┴────────────┴──────┘ 1 rows in set. Elapsed: 0.004 sec. node02 :) select * from cluster_zk; # node02节点 SELECT * FROM cluster_zk ┌─id─┬─website───────────────┬─wechat───┬─FlightDate─┬─Year─┐ │ 1 │ https://niocoder.com/ │ java干货 │ 2020-11-28 │ 2020 │ └────┴───────────────────────┴──────────┴────────────┴──────┘ 1 rows in set. Elapsed: 0.004 sec.
查询zk
信息
[zk: localhost:2181(CONNECTED) 2] ls /clickhouse/tables/cluster_zk/replicas [replica02, replica01] [zk: localhost:2181(CONNECTED) 3]
node01
修改metrika.xml
配置, 注意此处internal_replication
为true
<yandex> <clickhouse_remote_servers> <perftest_1shards_2replicas> <shard> <internal_replication>true</internal_replication> <replica> <host>node01</host> <port>9000</port> </replica> <replica> <host>node02</host> <port>9000</port> </replica> </shard> </perftest_1shards_2replicas> </clickhouse_remote_servers> <zookeeper-servers> <node index="1"> <host>node01</host> <port>2181</port> </node> <node index="2"> <host>node02</host> <port>2181</port> </node> <node index="3"> <host>node03</host> <port>2181</port> </node> </zookeeper-servers> </yandex>
将修改后的配置分发到node02
机器上
[root@node01 clickhouse-server]# scp /etc/clickhouse-server/metrika.xml node02:$PWD metrika.xml
查询集群信息
node01 :) select * from system.clusters; SELECT * FROM system.clusters ┌─cluster───────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐ │ perftest_1shards_2replicas │ 1 │ 1 │ 1 │ node01 │ 192.168.10.100 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ perftest_1shards_2replicas │ 1 │ 1 │ 2 │ node02 │ 192.168.10.110 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards │ 1 │ 1 │ 1 │ 127.0.0.1 │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards │ 2 │ 1 │ 1 │ 127.0.0.2 │ 127.0.0.2 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards_localhost │ 2 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_shard_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_shard_localhost_secure │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9440 │ 0 │ default │ │ 0 │ 0 │ │ test_unavailable_shard │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_unavailable_shard │ 2 │ 1 │ 1 │ localhost │ ::1 │ 1 │ 0 │ default │ │ 0 │ 0 │ └───────────────────────────────────┴───────────┴──────────────┴─────────────┴───────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘ 10 rows in set. Elapsed: 0.018 sec.
建立分布式表
CREATE TABLE default.clusterzk_all AS cluster_zk ENGINE = Distributed(perftest_1shards_2replicas, default, cluster_zk, rand());
分布式表查询数据
ode01 :) select * from clusterzk_all; SELECT * FROM clusterzk_all ┌─id─┬─website───────────────┬─wechat───┬─FlightDate─┬─Year─┐ │ 1 │ https://niocoder.com/ │ java干货 │ 2020-11-28 │ 2020 │ └────┴───────────────────────┴──────────┴────────────┴──────┘ 1 rows in set. Elapsed: 0.020 sec.
分布式表写入
上文已经提到,internal_replication
为true
,则经过分布表写入数据时,会自动找到“最健康”的副本写入,而后其余副本经过表自身的复制功能同步数据,最终达到数据一致。
ip | 主机 | clickhouse |
分片副本 |
---|---|---|---|
192.168.10.100 |
node01 |
9000 |
01/01 |
192.168.10.100 |
node01 |
9001 |
03/02 |
192.168.10.100 |
node02 |
9000 |
02/01 |
192.168.10.100 |
node02 |
9001 |
01/02 |
192.168.10.100 |
node03 |
9000 |
03/01 |
192.168.10.100 |
node03 |
9001 |
02/02 |
3分片2副本.
在node01
,node02
,node03
的9001
端口再启动一个clickhouse-server
实例。
即shard1
的两个副本放到node01 9000
、node02 9001
两个机器上,shard2
的两个副本放到node02 9000
、node03 9001
上,shard3
的两个副本放到node03 9000
、node01 9001
上.
node01
建立并修改config1.xml
[root@node01 clickhouse-server]# cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config1.xml [root@node01 clickhouse-server]# vim /etc/clickhouse-server/config1.xml
修改如下内容
<?xml version="1.0"?> <yandex> <!--省略其余 --> <http_port>8124</http_port> <tcp_port>9001</tcp_port> <mysql_port>9005</mysql_port> <interserver_http_port>9010</interserver_http_port> <log>/var/log/clickhouse-server/clickhouse-server-1.log</log> <errorlog>/var/log/clickhouse-server/clickhouse-server.err-1.log</errorlog> <!-- Path to data directory, with trailing slash. --> <path>/var/lib/clickhouse1/</path> <!-- Path to temporary data for processing hard queries. --> <tmp_path>/var/lib/clickhouse1/tmp/</tmp_path> <user_files_path>/var/lib/clickhouse1/user_files/</user_files_path> <format_schema_path>/var/lib/clickhouse1/format_schemas/</format_schema_path> <include_from>/etc/clickhouse-server/metrika1.xml</include_from> <!--省略其余 --> </yandex>
node01
建立并修改metrika.xml
<yandex> <!--ck集群节点--> <clickhouse_remote_servers> <!--ck集群名称--> <perftest_3shards_2replicas> <shard> <internal_replication>true</internal_replication> <replica> <host>node01</host> <port>9000</port> </replica> <replica> <host>node02</host> <port>9001</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>node02</host> <port>9000</port> </replica> <replica> <host>node03</host> <port>9001</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>node03</host> <port>9000</port> </replica> <replica> <host>node01</host> <port>9001</port> </replica> </shard> </perftest_3shards_2replicas> </clickhouse_remote_servers> <!--zookeeper相关配置--> <zookeeper-servers> <node index="1"> <host>node01</host> <port>2181</port> </node> <node index="2"> <host>node02</host> <port>2181</port> </node> <node index="3"> <host>node03</host> <port>2181</port> </node> </zookeeper-servers> <macros> <shard>01</shard> <!--分ID, 同一分片内的副本配置相同的分ID--> <replica>node01</replica> <!--当前节点主机名--> </macros> <networks> <ip>::/0</ip> </networks> <!--压缩相关配置--> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> <!--压缩算法lz4压缩比zstd快, 更占磁盘--> </case> </clickhouse_compression> </yandex>
metrika.xml
文件为metrika1.xml
,修改macros
配置node01
`metrika.xml`macros
配置<macros> <shard>01</shard> <!--分ID, 同一分片内的副本配置相同的分ID--> <replica>node01</replica> <!--当前节点主机名--> </macros>
node01
`metrika1.xml`macros
配置<macros> <shard>03</shard> <!--分ID, 同一分片内的副本配置相同的分ID--> <replica>node01</replica> <!--当前节点主机名--> </macros>
node02
`metrika.xml`macros
配置<macros> <shard>02</shard> <!--分ID, 同一分片内的副本配置相同的分ID--> <replica>node02</replica> <!--当前节点主机名--> </macros>
node02
`metrika1.xml`macros
配置<macros> <shard>01</shard> <!--分ID, 同一分片内的副本配置相同的分ID--> <replica>node02</replica> <!--当前节点主机名--> </macros>
node03
`metrika.xml`macros
配置<macros> <shard>03</shard> <!--分ID, 同一分片内的副本配置相同的分ID--> <replica>node03</replica> <!--当前节点主机名--> </macros>
node03
`metrika.1xml`macros
配置<macros> <shard>02</shard> <!--分ID, 同一分片内的副本配置相同的分ID--> <replica>node03</replica> <!--当前节点主机名--> </macros>
clickhouse-server-1
[root@node01 clickhouse-server]# cp /etc/rc.d/init.d/clickhouse-server /etc/rc.d/init.d/clickhouse-server-1 You have new mail in /var/spool/mail/root [root@node01 clickhouse-server]# vim /etc/rc.d/init.d/clickhouse-server-1
修改如下内容
CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config1.xml CLICKHOUSE_PIDFILE="$CLICKHOUSE_PIDDIR/$PROGRAM-1.pid"
node02
和node03
节点[root@node01 clickhouse-server]# scp /etc/rc.d/init.d/clickhouse-server-1 node02:/etc/rc.d/init.d/ clickhouse-server-1 100% 11KB 4.0MB/s 00:00 You have new mail in /var/spool/mail/root [root@node01 clickhouse-server]# scp /etc/rc.d/init.d/clickhouse-server-1 node03:/etc/rc.d/init.d/ clickhouse-server-1 100% 11KB 4.0MB/s 00:00 [root@node01 clickhouse-server]# scp /etc/clickhouse-server/config1.xml node02:$PWD config1.xml 100% 33KB 10.2MB/s 00:00 [root@node01 clickhouse-server]# scp /etc/clickhouse-server/config1.xml node03:$PWD config1.xml 100% 33KB 9.7MB/s 00:00 [root@node01 clickhouse-server]# scp /etc/clickhouse-server/metrika.xml node02:$PWD metrika.xml 100% 2008 1.0MB/s 00:00 You have new mail in /var/spool/mail/root [root@node01 clickhouse-server]# scp /etc/clickhouse-server/metrika.xml node03:$PWD metrika.xml 100% 2008 1.1MB/s 00:00 [root@node01 clickhouse-server]# [root@node01 clickhouse-server]# scp /etc/clickhouse-server/metrika1.xml node02:$PWD metrika1.xml 100% 2008 1.0MB/s 00:00 [root@node01 clickhouse-server]# scp /etc/clickhouse-server/metrika1.xml node03:$PWD metrika1.xml
修改node02
和node03
的macros
配置
ClickHouse
实例node01
的clickhouse-server-1
实例
node02
的clickhouse-server-1
实例
node03
的clickhouse-server
实例
node03
的clickhouse-server-1
实例
service clickhouse-server restart service clickhouse-server-1 restart
node01 :) select * from system.clusters; SELECT * FROM system.clusters ┌─cluster───────────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐ │ perftest_3shards_2replicas │ 1 │ 1 │ 1 │ node01 │ 192.168.10.100 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ perftest_3shards_2replicas │ 1 │ 1 │ 2 │ node02 │ 192.168.10.110 │ 9001 │ 0 │ default │ │ 0 │ 0 │ │ perftest_3shards_2replicas │ 2 │ 1 │ 1 │ node02 │ 192.168.10.110 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ perftest_3shards_2replicas │ 2 │ 1 │ 2 │ node03 │ 192.168.10.120 │ 9001 │ 0 │ default │ │ 0 │ 0 │ │ perftest_3shards_2replicas │ 3 │ 1 │ 1 │ node03 │ 192.168.10.120 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ perftest_3shards_2replicas │ 3 │ 1 │ 2 │ node01 │ 192.168.10.100 │ 9001 │ 0 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards │ 1 │ 1 │ 1 │ 127.0.0.1 │ 127.0.0.1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards │ 2 │ 1 │ 1 │ 127.0.0.2 │ 127.0.0.2 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_cluster_two_shards_localhost │ 2 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_shard_localhost │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_shard_localhost_secure │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9440 │ 0 │ default │ │ 0 │ 0 │ │ test_unavailable_shard │ 1 │ 1 │ 1 │ localhost │ ::1 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ test_unavailable_shard │ 2 │ 1 │ 1 │ localhost │ ::1 │ 1 │ 0 │ default │ │ 0 │ 0 │ └───────────────────────────────────┴───────────┴──────────────┴─────────────┴───────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘ 14 rows in set. Elapsed: 0.019 sec.
建立可复制表,node01
节点执行便可,其余节点会自动建立。
CREATE TABLE `cluster32r_local` ON cluster perftest_3shards_2replicas ( `id` Int32, `website` String, `wechat` String, `FlightDate` Date, Year UInt16 ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/ontime','{replica}', FlightDate, (Year, FlightDate), 8192);
node01 :) CREATE TABLE `cluster32r_local` ON cluster perftest_3shards_2replicas :-] ( :-] `id` Int32, :-] `website` String, :-] `wechat` String, :-] `FlightDate` Date, :-] Year UInt16 :-] ) :-] ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/ontime','{replica}', FlightDate, (Year, FlightDate), 8192); CREATE TABLE cluster32r_local ON CLUSTER perftest_3shards_2replicas ( `id` Int32, `website` String, `wechat` String, `FlightDate` Date, `Year` UInt16 ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/ontime', '{replica}', FlightDate, (Year, FlightDate), 8192) ┌─host───┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ │ node03 │ 9001 │ 0 │ │ 5 │ 0 │ │ node03 │ 9000 │ 0 │ │ 4 │ 0 │ │ node01 │ 9001 │ 0 │ │ 3 │ 0 │ │ node01 │ 9000 │ 0 │ │ 2 │ 0 │ │ node02 │ 9000 │ 0 │ │ 1 │ 0 │ └────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘ ┌─host───┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ │ node02 │ 9001 │ 0 │ │ 0 │ 0 │ └────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘ 6 rows in set. Elapsed: 46.994 sec.
建立分布式表
CREATE TABLE cluster32r_all AS cluster32r_local ENGINE = Distributed(perftest_3shards_2replicas, default, cluster32r_local, rand());
往第一个shard
的副本插入数据(node01 9000
),能够在第二个副本中查看数据(node02 9001
)
INSERT INTO default.cluster32r_local (id,website,wechat,FlightDate,Year)values(1,'https://niocoder.com/','java干货','2020-11-28',2020); INSERT INTO default.cluster32r_local (id,website,wechat,FlightDate,Year)values(2,'http://www.merryyou.cn/','javaganhuo','2020-11-28',2020);
使用客户端连接node02 9001
实例查看
[root@node02 ~]# clickhouse-client --port 9001 -m ClickHouse client version 20.8.3.18. Connecting to localhost:9001 as user default. Connected to ClickHouse server version 20.8.3 revision 54438. node02 :) show tables; SHOW TABLES ┌─name─────────────┐ │ cluster32r_local │ └──────────────────┘ 1 rows in set. Elapsed: 0.010 sec. node02 :) select * from cluster32r_local; SELECT * FROM cluster32r_local ┌─id─┬─website─────────────────┬─wechat─────┬─FlightDate─┬─Year─┐ │ 2 │ http://www.merryyou.cn/ │ javaganhuo │ 2020-11-28 │ 2020 │ └────┴─────────────────────────┴────────────┴────────────┴──────┘ ┌─id─┬─website───────────────┬─wechat───┬─FlightDate─┬─Year─┐ │ 1 │ https://niocoder.com/ │ java干货 │ 2020-11-28 │ 2020 │ └────┴───────────────────────┴──────────┴────────────┴──────┘ 2 rows in set. Elapsed: 0.018 sec.
分布式表查询
node01 :) select * from cluster32r_all; SELECT * FROM cluster32r_all ┌─id─┬─website─────────────────┬─wechat─────┬─FlightDate─┬─Year─┐ │ 2 │ http://www.merryyou.cn/ │ javaganhuo │ 2020-11-28 │ 2020 │ └────┴─────────────────────────┴────────────┴────────────┴──────┘ ┌─id─┬─website───────────────┬─wechat───┬─FlightDate─┬─Year─┐ │ 1 │ https://niocoder.com/ │ java干货 │ 2020-11-28 │ 2020 │ └────┴───────────────────────┴──────────┴────────────┴──────┘ 2 rows in set. Elapsed: 0.030 sec.
全部副本节点都可本地表和分布式表都可读写数据
关注微信公众号java干货回复 【clickhouse】