Docker快速搭建Clickhouse集群(3分片3副本)
背景前文简单介绍了下Clickhouse的安装和客户端使用,在实际生产环境中,Clickhouse常常是以集群模式部署的,由于很多系统不满足sse4.2指令,这里使用docker来搭建一个Clickhouse的集群。1. 环境说明1.1 机器列表机器名IP配置操作系统部署的服务备注server01192.168.21.2
背景
前文简单介绍了下Clickhouse的安装和客户端使用,在实际生产环境中,Clickhouse常常是以集群模式部署的,由于很多系统不满足sse4.2指令,这里使用docker来搭建一个Clickhouse的集群。
1. 环境说明
1.1 机器列表
机器名 | IP | 配置 | 操作系统 | 部署的服务 | 备注 |
server01 | 192.168.21.21 | 8c8g | centos7.3 | clickhouserver(cs01-01)和 clickhouserver(cs01-02) | clickhouse01-01: 实例1, 端口: tcp 9000, http 8123, 同步端口9009, 类型: 分片1, 副本1 clickhouse01-02: 实例2, 端口: tcp 9001, http 8124, 同步端口9010, 类型: 分片2, 副本2 (clickhouse2的副本) |
server02 | 192.168.21.69 | 8c8g | centos7.3 | clickhouserver(cs02-01)和 clickhouserver(cs02-02) | clickhouse02-01: 实例1, 端口: tcp 9000, http 8123, 同步端口9009, 类型: 分片2, 副本1 clickhouse02-02: 实例2, 端口: tcp 9001, http 8124, 同步端口9010, 类型: 分片3, 副本2 (clickhouse3的副本) |
server03 | 192.168.21.6 | 8c8g | centos7.3 | clickhouserver(cs03-01)和 clickhouserver(cs03-02) | clickhouse03-01: 实例1, 端口: tcp 9000, http 8123, 同步端口9009, 类型: 分片3, 副本1 clickhouse03-02: 实例2, 端口: tcp 9001, http 8124, 同步端口9010, 类型: 分片1, 副本2 (clickhouse1的副本) |
1.2 机器初始化
1.2.1 配置host
执行:vi /etc/hosts, 加入下面三行:
192.168.21.21 server01
192.168.21.69 server02
192.168.21.6 server03
1.2.2 安装docker
每台机器上均安装相同版本的docker
参照:docker环境搭建
1.2.3 安装Zookeeper
1.3 目录初始化
1.3.1 创建对应本地路径
分别在三台服务器,创建数据存储目录:
mkdir /data/clickhouse
1.3.2 获取clickhouse-server的配置
1)在server01服务器做如下操作
先按照官方教程的docker命令启动Clickhouse-Server
docker run -d --name clickhouse-server --ulimit nofile=262144:262144 --volume=/data/clickhouse/:/var/lib/clickhouse yandex/clickhouse-server
2)启动完成后,复制容器内的配置文件到本机目录下
#拷贝容器内容的配置到/etc目录下
docker cp clickhouse-server:/etc/clickhouse-server/ /etc/
#将server01上的目录重命名
cp -rf /etc/clickhouse-server/ /etc/clickhouse-server01/
cp -rf /etc/clickhouse-server/ /etc/clickhouse-server02/
3)然后将/etc/clickhouse-server/ 分别拷贝到每个机器上
#拷贝配置到server02上
scp /etc/clickhouse-server/ server02:/etc/clickhouse-server01/
scp /etc/clickhouse-server/ server02:/etc/clickhouse-server02/
#拷贝配置到server03上
scp /etc/clickhouse-server/ server03:/etc/clickhouse-server01/
scp /etc/clickhouse-server/ server03:/etc/clickhouse-server02/
2. 集群环境搭建
2.1 集群环境拓扑图
集群环境说明:
clickhouse01-01: 实例1, 端口: tcp 9000, http 8123, 同步端口9009, 类型: 分片1, 副本1
clickhouse01-02: 实例2, 端口: tcp 9001, http 8124, 同步端口9010, 类型: 分片2, 副本2 (clickhouse2的副本)
clickhouse02-01: 实例1, 端口: tcp 9000, http 8123, 同步端口9009, 类型: 分片2, 副本1
clickhouse02-02: 实例2, 端口: tcp 9001, http 8124, 同步端口9010, 类型: 分片3, 副本2 (clickhouse3的副本)
clickhouse03-01: 实例1, 端口: tcp 9000, http 8123, 同步端口9009, 类型: 分片3, 副本1
clickhouse03-02: 实例2, 端口: tcp 9001, http 8124, 同步端口9010, 类型: 分片1, 副本2 (clickhouse1的副本)
2.2 配置集群
2.2.1 待修改的配置文件
需要修改的配置有两个(如果有需要也可以配置user.xml):
- /etc/clickhouse-server/config.xml
- /etc/clickhouse-server/metrika.xml(新增文件)
2.2.2 server1上配置clickhouse-server的实例
2.2.2.1 clickhouse-01-01的配置:
1)/etc/clickhouse-server01/config.xml(其他实例此配置内容和这个一样就行)
修改include from节点为实际的引用到的文件
<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file. By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element. Values for substitutions are specified in /yandex/name_of_substitution elements in that file. --> <include_from>/etc/clickhouse-server/metrika.xml</include_from> <listen_host>0.0.0.0</listen_host> <listen_host>127.0.0.1</listen_host>
2)/etc/clickhouse-server01/metrika.xml(所有实例的配置内容都和这个一样就行)
<!--所有实例均使用这个集群配置,不用个性化 --> <yandex> <!-- 集群配置 --> <!-- clickhouse_remote_servers所有实例配置都一样 --> <!-- 集群配置 --> <clickhouse_remote_servers> <cluster_3s_1r> <!-- 数据分片1 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server01</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server03</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片2 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server02</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server01</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片3 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server03</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server02</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> </cluster_3s_1r> </clickhouse_remote_servers> <!-- ZK --> <!-- zookeeper_servers所有实例配置都一样 --> <zookeeper-servers> <node index="1"> <host>192.168.21.66</host> <port>2181</port> </node> <node index="2"> <host>192.168.21.57</host> <port>2181</port> </node> <node index="3"> <host>192.168.21.17</host> <port>2181</port> </node> </zookeeper-servers> <!-- marcos每个实例配置不一样 分片1, 副本1 --> <macros> <layer>01</layer> <shard>01</shard> <replica>cluster01-01-1</replica> </macros> <networks> <ip>::/0</ip> </networks> <!-- 数据压缩算法 --> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
2.2.2.2 clickhouse-01-02的配置:
1)/etc/clickhouse-server02/metrika.xml(所有实例的配置内容都和这个一样就行)
<!--所有实例均使用这个集群配置,不用个性化 --> <yandex> <!-- 集群配置 --> <!-- clickhouse_remote_servers所有实例配置都一样 --> <clickhouse_remote_servers> <cluster_3s_1r> <!-- 数据分片1 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server01</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server03</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片2 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server02</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server01</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片3 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server03</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server02</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> </cluster_3s_1r> </clickhouse_remote_servers> <!-- ZK --> <!-- zookeeper_servers所有实例配置都一样 --> <zookeeper-servers> <node index="1"> <host>192.168.21.66</host> <port>2181</port> </node> <node index="2"> <host>192.168.21.57</host> <port>2181</port> </node> <node index="3"> <host>192.168.21.17</host> <port>2181</port> </node> </zookeeper-servers> <!-- marcos每个实例配置不一样 分片2, 副本2--> <macros> <layer>01</layer> <shard>02</shard> <replica>cluster01-02-2</replica> </macros> <networks> <ip>::/0</ip> </networks> <!-- 数据压缩算法 --> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
2.2.3 server2上配置clickhouse-server的实例
2.2.3.1 clickhouse-02-01的配置:
/etc/clickhouse-server01/metrika.xml
<!--所有实例均使用这个集群配置,不用个性化 --> <yandex> <!-- 集群配置 --> <!-- clickhouse_remote_servers所有实例配置都一样 --> <clickhouse_remote_servers> <cluster_3s_1r> <!-- 数据分片1 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server01</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server03</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片2 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server02</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server01</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片3 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server03</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server02</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> </cluster_3s_1r> </clickhouse_remote_servers> <!-- ZK --> <!-- zookeeper_servers所有实例配置都一样 --> <zookeeper-servers> <node index="1"> <host>192.168.21.66</host> <port>2181</port> </node> <node index="2"> <host>192.168.21.57</host> <port>2181</port> </node> <node index="3"> <host>192.168.21.17</host> <port>2181</port> </node> </zookeeper-servers> <!-- marcos每个实例配置不一样 分片2, 副本1--> <macros> <layer>01</layer> <shard>02</shard> <replica>cluster01-02-1</replica> </macros> <networks> <ip>::/0</ip> </networks> <!-- 数据压缩算法 --> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
2.2.3.2 clickhouse-02-02的配置:
/etc/clickhouse-server02/metrika.xml
<!--所有实例均使用这个集群配置,不用个性化 --> <yandex> <!-- 集群配置 --> <!-- clickhouse_remote_servers所有实例配置都一样 --> <clickhouse_remote_servers> <cluster_3s_1r> <!-- 数据分片1 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server01</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server03</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片2 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server02</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server01</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片3 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server03</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server02</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> </cluster_3s_1r> </clickhouse_remote_servers> <!-- ZK --> <!-- zookeeper_servers所有实例配置都一样 --> <zookeeper-servers> <node index="1"> <host>192.168.21.66</host> <port>2181</port> </node> <node index="2"> <host>192.168.21.57</host> <port>2181</port> </node> <node index="3"> <host>192.168.21.17</host> <port>2181</port> </node> </zookeeper-servers> <!-- marcos每个实例配置不一样 分片3, 副本2--> <macros> <layer>01</layer> <shard>03</shard> <replica>cluster01-03-2</replica> </macros> <networks> <ip>::/0</ip> </networks> <!-- 数据压缩算法 --> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
2.2.4 server3上配置clickhouse-server的实例
2.2.4.1 clickhouse-03-01的配置:
1)/etc/clickhouse-server01/metrika.xml
<!--所有实例均使用这个集群配置,不用个性化 --> <yandex> <!-- 集群配置 --> <!-- clickhouse_remote_servers所有实例配置都一样 --> <clickhouse_remote_servers> <cluster_3s_1r> <!-- 数据分片1 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server01</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server03</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片2 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server02</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server01</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片3 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server03</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server02</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> </cluster_3s_1r> </clickhouse_remote_servers> <!-- ZK --> <!-- zookeeper_servers所有实例配置都一样 --> <zookeeper-servers> <node index="1"> <host>192.168.21.66</host> <port>2181</port> </node> <node index="2"> <host>192.168.21.57</host> <port>2181</port> </node> <node index="3"> <host>192.168.21.17</host> <port>2181</port> </node> </zookeeper-servers> <!-- marcos每个实例配置不一样 分片3, 副本1--> <macros> <layer>01</layer> <shard>03</shard> <replica>cluster01-03-1</replica> </macros> <networks> <ip>::/0</ip> </networks> <!-- 数据压缩算法 --> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
2.2.4.2 clickhouse-03-02的配置:
1. /etc/clickhouse-server02/metrika.xml
<!--所有实例均使用这个集群配置,不用个性化 --> <yandex> <!-- 集群配置 --> <!-- clickhouse_remote_servers所有实例配置都一样 --> <clickhouse_remote_servers> <cluster_3s_1r> <!-- 数据分片1 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server01</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server03</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片2 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server02</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server01</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- 数据分片3 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server03</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server02</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> </cluster_3s_1r> </clickhouse_remote_servers> <!-- ZK --> <!-- zookeeper_servers所有实例配置都一样 --> <zookeeper-servers> <node index="1"> <host>192.168.21.66</host> <port>2181</port> </node> <node index="2"> <host>192.168.21.57</host> <port>2181</port> </node> <node index="3"> <host>192.168.21.17</host> <port>2181</port> </node> </zookeeper-servers> <!-- marcos每个实例配置不一样 分片1, 副本2--> <macros> <layer>01</layer> <shard>01</shard> <replica>cluster01-01-2</replica> </macros> <networks> <ip>::/0</ip> </networks> <!-- 数据压缩算法 --> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex>
2.3 运行clickhouse集群
2.3.1 运行clickhouse01-01实例
登陆到server01:ssh server01
#先删除cs01-01容器
docker rm -f cs01-01
#运行Clickhouse-server
docker run -d \
--name cs01-01 \
--ulimit nofile=262144:262144 \
--volume=/data/clickhouse01/:/var/lib/clickhouse \
--volume=/etc/clickhouse-server01/:/etc/clickhouse-server/ \
--add-host server01:192.168.21.21 \
--add-host server02:192.168.21.69 \
--add-host server03:192.168.21.6 \
--hostname $(hostname) \
-p 9000:9000 \
-p 8123:8123 \
-p 9009:9009 \
yandex/clickhouse-server
说明:
--add-host参数:因为我们在配置文件中使用了hostname来指代我们的服务器,为了让容器能够识别,所以需要加此参数,对应的host配置会自动被添加到容器主机的/etc/hosts里面
--hostname参数:clickhouse中的system.clusters表会显示集群信息,其中is_local的属性如果不配置hostname的话clickhouse无法识别是否是当前本机。is_local都为0的话会影响集群操作,is_local通过clickhouse-client登录到任一clickhouse-server上查看:SELECT * FROM system.clusters;
--p参数:暴露容器中的端口到本机端口中。
2.3.2 运行clickhouse01-02实例
登陆到server01:ssh server01;
docker run -d \
--name cs01-02 \
--ulimit nofile=262144:262144 \
--volume=/data/clickhouse02/:/var/lib/clickhouse \
--volume=/etc/clickhouse-server02/:/etc/clickhouse-server/ \
--add-host server01:192.168.21.21 \
--add-host server02:192.168.21.69 \
--add-host server03:192.168.21.6 \
--hostname $(hostname) \
-p 9001:9000 \
-p 8124:8123 \
-p 9010:9009 \
yandex/clickhouse-server
2.3.3 运行clickhouse02-01实例
登陆到server02:ssh server02
docker run -d \
--name cs02-01 \
--ulimit nofile=262144:262144 \
--volume=/data/clickhouse01/:/var/lib/clickhouse \
--volume=/etc/clickhouse-server01/:/etc/clickhouse-server/ \
--add-host server01:192.168.21.21 \
--add-host server02:192.168.21.69 \
--add-host server03:192.168.21.6 \
--hostname $(hostname) \
-p 9000:9000 \
-p 8123:8123 \
-p 9009:9009 \
yandex/clickhouse-server
2.3.4 运行clickhouse02-02实例
登陆到server02:ssh server02
docker run -d \
--name cs02-02 \
--ulimit nofile=262144:262144 \
--volume=/data/clickhouse02/:/var/lib/clickhouse \
--volume=/etc/clickhouse-server02/:/etc/clickhouse-server/ \
--add-host server01:192.168.21.21 \
--add-host server02:192.168.21.69 \
--add-host server03:192.168.21.6 \
--hostname $(hostname) \
-p 9001:9000 \
-p 8124:8123 \
-p 9010:9009 \
yandex/clickhouse-server
2.3.5 运行clickhouse03-01实例
登陆到server03:ssh server03
docker run -d \
--name cs03-01 \
--ulimit nofile=262144:262144 \
--volume=/data/clickhouse01/:/var/lib/clickhouse \
--volume=/etc/clickhouse-server01/:/etc/clickhouse-server/ \
--add-host server01:192.168.21.21 \
--add-host server02:192.168.21.69 \
--add-host server03:192.168.21.6 \
--hostname $(hostname) \
-p 9000:9000 \
-p 8123:8123 \
-p 9009:9009 \
yandex/clickhouse-server
2.3.6 运行clickhouse03-02实例
登陆到server03:ssh server03
docker run -d \
--name cs03-02 \
--ulimit nofile=262144:262144 \
--volume=/data/clickhouse02/:/var/lib/clickhouse \
--volume=/etc/clickhouse-server02/:/etc/clickhouse-server/ \
--add-host server01:192.168.21.21 \
--add-host server02:192.168.21.69 \
--hostname $(hostname) \
-p 9001:9000 \
-p 8124:8123 \
-p 9010:9009 \
yandex/clickhouse-server
2.4 clickhouse集群的数据操作
2.4.1 运行客户端连接clickhouse server
1)随便在哪一台实例的机器上执行如下(连不同的clickhouse实例只需要改下host和port的值即可)
docker run -it \ --rm \ --add-host server01:192.168.21.21 \ --add-host server02:192.168.21.69 \ --add-host server03:192.168.21.6 \ yandex/clickhouse-client \ --host server01 \ --port 9000
2)执行下如下命令查看下配置信息:
#需要看下,是否和metrika.xml配置的分片和副本信息一致,如果不一致,需要check下每个clickhouse-server实例的配置
#应该is_local显示为0,且分片和副本信息都正确
SELECT * FROM system.clusters;
2.4.2 创建本地复制表和分布式表
所有实例配置完上面这些之后,分别执行启动命令启动,然后所有实例都执行下面语句创建数据库:
例如在实例01-02上执行:
#执行Clickhouse client进入Clickhouse
docker run -it --rm --add-host server01:192.168.21.21 --add-host server02:192.168.21.69 --add-host server03:192.168.21.6 \
yandex/clickhouse-client --host server01 --port 9001
ClickHouse client version 19.17.5.18 (official build).
Connecting to server01:9001 as user default.
Connected to ClickHouse server version 19.17.5 revision 54428.
i-r9es2e0q :) CREATE DATABASE test;
CREATE DATABASE test
Ok.
2.4.3 创建复制表
然后对于所有实例分别创建对应的复制表,这里测试创建一个简单的表:
#在clickhouse01-01 9000上执行:
docker run -it --rm --add-host server01:192.168.21.21 --add-host server02:192.168.21.69 --add-host server03:192.168.21.6 \
yandex/clickhouse-client --host server01 --port 9000
#然后执行
CREATE TABLE test.device_thing_data (
time UInt64,
user_id String,
device_id String,
source_id String,
thing_id String,
identifier String,
value_int32 Int32,
value_float Float32,
value_double Float64,
value_string String,
value_enum Enum8('0'=0,'1'=1,'2'=2,'3'=3,'4'=4,'5'=5,'6'=6,'7'=7,'8'=8),
value_string_ex String,
value_array_string Array(String),
value_array_int32 Array(Int32),
value_array_float Array(Float32),
value_array_double Array(Float64),
action_date Date,
action_time DateTime
) Engine= ReplicatedMergeTree('/clickhouse/tables/01-01/device_thing_data','cluster01-01-1') PARTITION BY toYYYYMM(action_date) ORDER BY (user_id,device_id,thing_id,identifier,time,intHash64(time)) SAMPLE BY intHash64(time) SETTINGS index_granularity=8192
#在clickhouse01-02 9001上执行:
docker run -it --rm --add-host server01:192.168.21.21 --add-host server02:192.168.21.69 --add-host server03:192.168.21.6 \
yandex/clickhouse-client --host server01 --port 9001
然后执行
CREATE TABLE test.device_thing_data (
time UInt64,
user_id String,
device_id String,
source_id String,
thing_id String,
identifier String,
value_int32 Int32,
value_float Float32,
value_double Float64,
value_string String,
value_enum Enum8('0'=0,'1'=1,'2'=2,'3'=3,'4'=4,'5'=5,'6'=6,'7'=7,'8'=8),
value_string_ex String,
value_array_string Array(String),
value_array_int32 Array(Int32),
value_array_float Array(Float32),
value_array_double Array(Float64),
action_date Date,
action_time DateTime
) Engine= ReplicatedMergeTree('/clickhouse/tables/01-02/device_thing_data','cluster01-02-2') PARTITION BY toYYYYMM(action_date) ORDER BY (user_id,device_id,thing_id,identifier,time,intHash64(time)) SAMPLE BY intHash64(time) SETTINGS index_granularity=8192
#在clickhouse02-01 9000上执行:
docker run -it --rm --add-host server01:192.168.21.21 --add-host server02:192.168.21.69 --add-host server03:192.168.21.6 \
yandex/clickhouse-client --host server02 --port 9000
#然后执行
CREATE TABLE test.device_thing_data (
time UInt64,
user_id String,
device_id String,
source_id String,
thing_id String,
identifier String,
value_int32 Int32,
value_float Float32,
value_double Float64,
value_string String,
value_enum Enum8('0'=0,'1'=1,'2'=2,'3'=3,'4'=4,'5'=5,'6'=6,'7'=7,'8'=8),
value_string_ex String,
value_array_string Array(String),
value_array_int32 Array(Int32),
value_array_float Array(Float32),
value_array_double Array(Float64),
action_date Date,
action_time DateTime
) Engine= ReplicatedMergeTree('/clickhouse/tables/01-02/device_thing_data','cluster01-02-1') PARTITION BY toYYYYMM(action_date) ORDER BY (user_id,device_id,thing_id,identifier,time,intHash64(time)) SAMPLE BY intHash64(time) SETTINGS index_granularity=8192
#在clickhouse02-02 9001上执行:
docker run -it --rm --add-host server01:192.168.21.21 --add-host server02:192.168.21.69 --add-host server03:192.168.21.6 \
yandex/clickhouse-client --host server02 --port 9001
#然后执行
CREATE TABLE test.device_thing_data (
time UInt64,
user_id String,
device_id String,
source_id String,
thing_id String,
identifier String,
value_int32 Int32,
value_float Float32,
value_double Float64,
value_string String,
value_enum Enum8('0'=0,'1'=1,'2'=2,'3'=3,'4'=4,'5'=5,'6'=6,'7'=7,'8'=8),
value_string_ex String,
value_array_string Array(String),
value_array_int32 Array(Int32),
value_array_float Array(Float32),
value_array_double Array(Float64),
action_date Date,
action_time DateTime
) Engine= ReplicatedMergeTree('/clickhouse/tables/01-03/device_thing_data','cluster01-03-2') PARTITION BY toYYYYMM(action_date) ORDER BY (user_id,device_id,thing_id,identifier,time,intHash64(time)) SAMPLE BY intHash64(time) SETTINGS index_granularity=8192
#在clickhouse03-01 9000上执行:
docker run -it --rm --add-host server01:192.168.21.21 --add-host server02:192.168.21.69 --add-host server03:192.168.21.6 \
yandex/clickhouse-client --host server03 --port 9000
#然后执行
CREATE TABLE test.device_thing_data (
time UInt64,
user_id String,
device_id String,
source_id String,
thing_id String,
identifier String,
value_int32 Int32,
value_float Float32,
value_double Float64,
value_string String,
value_enum Enum8('0'=0,'1'=1,'2'=2,'3'=3,'4'=4,'5'=5,'6'=6,'7'=7,'8'=8),
value_string_ex String,
value_array_string Array(String),
value_array_int32 Array(Int32),
value_array_float Array(Float32),
value_array_double Array(Float64),
action_date Date,
action_time DateTime
) Engine= ReplicatedMergeTree('/clickhouse/tables/01-03/device_thing_data','cluster01-03-1') PARTITION BY toYYYYMM(action_date) ORDER BY (user_id,device_id,thing_id,identifier,time,intHash64(time)) SAMPLE BY intHash64(time) SETTINGS index_granularity=8192
#在clickhouse03-02 9001上执行:
docker run -it --rm --add-host server01:192.168.21.21 --add-host server02:192.168.21.69 --add-host server03:192.168.21.6 \
yandex/clickhouse-client --host server03 --port 9001
然后执行
CREATE TABLE test.device_thing_data (
time UInt64,
user_id String,
device_id String,
source_id String,
thing_id String,
identifier String,
value_int32 Int32,
value_float Float32,
value_double Float64,
value_string String,
value_enum Enum8('0'=0,'1'=1,'2'=2,'3'=3,'4'=4,'5'=5,'6'=6,'7'=7,'8'=8),
value_string_ex String,
value_array_string Array(String),
value_array_int32 Array(Int32),
value_array_float Array(Float32),
value_array_double Array(Float64),
action_date Date,
action_time DateTime
) Engine= ReplicatedMergeTree('/clickhouse/tables/01-01/device_thing_data','cluster01-01-2') PARTITION BY toYYYYMM(action_date) ORDER BY (user_id,device_id,thing_id,identifier,time,intHash64(time)) SAMPLE BY intHash64(time) SETTINGS index_granularity=8192
2.4.3 创建分布式表(用于查询)
然后创建完上面复制表之后,可以创建分布式表,分布式表只是作为一个查询引擎,本身不存储任何数据,查询时将sql发送到所有集群分片,然后进行进行处理和聚合后将结果返回给客户端,因此clickhouse限制聚合结果大小不能大于分布式表节点的内存,当然这个一般条件下都不会超过;分布式表可以所有实例都创建,也可以只在一部分实例创建,这个和业务代码中查询的示例一致,建议设置多个,当某个节点挂掉时可以查询其他节点上的表,分布式表的建表语句如下:
#在clickhouse-server集群上一次性创建所有的分布式表,操作卡主了。原因不明
CREATE TABLE device_thing_data_all ON CLUSTER cluster_3s_1r AS test.device_thing_data ENGINE = Distributed(cluster_3s_1r, default, device_thing_data, rand())
#如下这个需要每个机器上都操作一遍
CREATE TABLE device_thing_data_all AS test.device_thing_data ENGINE = Distributed(cluster_3s_1r, test, device_thing_data, rand())
2.4.4 测试可用性
#客户端连接到某个clickhouse-server实例(例如cs01-01)
#查询分布式表,此时没有查询到数据
select * from image_label_all;
#在cs01-01上执行如下查看是否有数据
#查询本地复制表,此时没有查询到数据
select * from test.device_thing_data;
#往复制表中表里插入一条数据
INSERT INTO test.device_thing_data; (user_id) VALUES ('1')
#由于刚才在cs01-01上插入一条数据,所以应该有数据了
select * from test.device_thing_data;
#查询分布式表,也有数据了
select * from image_label_all;
#在cs03-02上查询复制表的数据(由于cs03-02是cs01-01的副本,所以数据被自动同步过来了),所以应该有数据了
select * from test.device_thing_data;
#在cs03-01上插入一条数据,此时应该会把数据同步到cs02-02上
INSERT INTO test.device_thing_data (user_id) VALUES ('2')
#再次查询分布式表,此时应该查到cs03-01和cs01-01上的两条数据
select * from image_label_all;
传送门:2021最新测试资料&大厂职位
博主:测试生财(一个不为996而996的测开码农)
座右铭:专注测试开发与自动化运维,努力读书思考写作,为内卷的人生奠定财务自由。
内容范畴:技术提升,职场杂谈,事业发展,阅读写作,投资理财,健康人生。
csdn:https://blog.csdn.net/ccgshigao
博客园:https://www.cnblogs.com/qa-freeroad/
51cto:https://blog.51cto.com/14900374
微信公众号:测试生财(定期分享独家内容和资源)
更多推荐
所有评论(0)