前面的文章简单的介绍了ClickHouse,以及也进行了简单的性能测试。本次说说集群的搭建以及数据复制,若是复制数据须要zookeeper配合。html
环境:node
1. 3台机器,我这里是3台虚拟机。都安装了clickhouse。服务器
2. 绑定hosts,其实不绑定也不要紧,配置文件里面直接写ip。(3台机器都绑定hosts,以下)tcp
192.168.0.10 db_server_yayun_01 192.168.0.20 db_server_yayun_02 192.168.0.30 db_server_yayun_03
3. 建立配置文件,默认这个配置文件是不存在的。/etc/clickhouse-server/config.xml有提示,以下:
If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
Values for substitutions are specified in /yandex/name_of_substitution elements in that file.分布式
配置文件/etc/metrika.xml内容以下:性能
<yandex> <clickhouse_remote_servers> <perftest_3shards_1replicas> <shard> <internal_replication>true</internal_replication> <replica> <host>db_server_yayun_01</host> <port>9000</port> </replica> </shard> <shard> <replica> <internal_replication>true</internal_replication> <host>db_server_yayun_02</host> <port>9000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <replica> <host>db_server_yayun_03</host> <port>9000</port> </replica> </shard> </perftest_3shards_1replicas> </clickhouse_remote_servers> <zookeeper-servers> <node index="1"> <host>192.168.0.30</host> <port>2181</port> </node> </zookeeper-servers> <macros> <replica>192.168.0.10</replica> </macros> <networks> <ip>::/0</ip> </networks> <clickhouse_compression> <case>
<min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case>
</clickhouse_compression> </yandex>
3台机器的配置文件都同样,惟一有区别的是:测试
<macros> <replica>192.168.0.10</replica> </macros>
服务器ip是多少这里就写多少,其实不写ip也不要紧,3台机器不重复就行。这里是复制须要用到的配置。还有zk的配置以下:spa
<zookeeper-servers> <node index="1"> <host>192.168.0.30</host> <port>2181</port> </node> </zookeeper-servers>
个人zk是安装在30的机器上面的,只安装了一个实例,生产环境确定要放到单独的机器,而且配置成集群。配置文件修改好之后3台服务器重启。
官方文档给的步骤是:code
ClickHouse deployment to cluster ClickHouse cluster is a homogenous cluster. Steps to set up: 1. Install ClickHouse server on all machines of the cluster 2. Set up cluster configs in configuration file 3. Create local tables on each instance 4. Create a Distributed table
前面2步都搞定了,下面建立本地表,再建立Distributed表。(3台机器都建立,DDL不一样步,蛋疼)server
CREATE TABLE ontime_local (FlightDate Date,Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192); CREATE TABLE ontime_all AS ontime_local ENGINE = Distributed(perftest_3shards_1replicas, default, ontime_local, rand())
插入数据(随便一台机器就行):
:) insert into ontime_all (FlightDate,Year)values('2001-10-12',2001); INSERT INTO ontime_all (FlightDate, Year) VALUES Ok. 1 rows in set. Elapsed: 0.013 sec. :) insert into ontime_all (FlightDate,Year)values('2002-10-12',2002); INSERT INTO ontime_all (FlightDate, Year) VALUES Ok. 1 rows in set. Elapsed: 0.004 sec. :) insert into ontime_all (FlightDate,Year)values('2003-10-12',2003); INSERT INTO ontime_all (FlightDate, Year) VALUES Ok.
我这里插入了3条数据。下面查询看看(任何一台机器均可以):
:) select * from ontime_all; SELECT * FROM ontime_all ┌─FlightDate─┬─Year─┐ │ 2001-10-12 │ 2001 │ └────────────┴──────┘ ┌─FlightDate─┬─Year─┐ │ 2002-10-12 │ 2002 │ └────────────┴──────┘ ┌─FlightDate─┬─Year─┐ │ 2003-10-12 │ 2003 │ └────────────┴──────┘ → Progress: 3.00 rows, 12.00 B (48.27 rows/s., 193.08 B/s.) 3 rows in set. Elapsed: 0.063 sec. :)
当在其中一台机器上面查询的时候,抓包其余机器能够看见是有请求的。
tcpdump -i any -s 0 -l -w - dst port 9000
那么关闭其中一台机器呢?
:) select * from ontime_all; SELECT * FROM ontime_all ┌─FlightDate─┬─Year─┐ │ 2001-10-12 │ 2001 │ └────────────┴──────┘ ┌─FlightDate─┬─Year─┐ │ 2002-10-12 │ 2002 │ └────────────┴──────┘ ┌─FlightDate─┬─Year─┐ │ 2003-10-12 │ 2003 │ └────────────┴──────┘ ↓ Progress: 6.00 rows, 24.00 B (292.80 rows/s., 1.17 KB/s.) Received exception from server: Code: 279. DB::Exception: Received from localhost:9000, ::1. DB::NetException. DB::NetException: All connection tries failed. Log: Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException Code: 210, e.displayText() = DB::NetException: Connection refused: (db_server_yayun_02:9000, 192.168.0.20), e.what() = DB::NetException
能够看见已经抛错了,居然不是高可用?后面又看到了文档的另一种配置方法,那就是配置2个节点,副本2个,通过测试高可用没有问题,另外也是分布式并行查询。感兴趣的同窗能够自行测试。
https://clickhouse.yandex/reference_en.html#Distributed
下面进行数据复制的测试,zk已经配置好了,直接建表测试(3台机器都建立):
CREATE TABLE ontime_replica (FlightDate Date,Year UInt16) ENGINE = ReplicatedMergeTree('/clickhouse_perftest/tables/ontime_replica','{replica}',FlightDate,(Year, FlightDate),8192);
插入数据测试:
insert into ontime_replica (FlightDate,Year)values('2018-10-12',2018);
任何一台机器都可查询到。其实到如今对于集群和复制都还没完全搞明白,由于分布式表也进行了数据复制,因此有点懵。有大婶的话欢迎一块儿交流。
参考资料: