1.安装clickhouse环境

ClickHouse对Debian/Ubuntu支持较好,但是工作当中服务器我们一般用CentOs。今天我们使用CentOs7来安装一下ClickHouse。
操作系统版本:CentOS Linux release 7.5.1804 (Core)

检查一下是否支持SSE 4.2指令集:

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

2.下载安装包

  1. 下载地址:

    https://repo.yandex.ru/clickhouse/rpm/stable/x86_64/
    下载一下四个包:
    clickhouse-client-18.12.17-2.noarch.rpm
    clickhouse-server-common-18.12.17-2.noarch.rpm
    clickhouse-server-18.12.17-2.noarch.rpm
    clickhouse-client-18.12.17-2.noarch.rpm
    
  2. 开始安装:

    rpm -ivh clickhouse-server-common-18.12.17-2.noarch.rpm
    rpm -ivh clickhouse-server-18.12.17-2.noarch.rpm 
    rpm -ivh clickhouse-common-static-18.12.17-2.x86_64.rpm
    rpm -ivh clickhouse-client-18.12.17-2.noarch.rpm 
    

    注意:在安装第二步会出现依赖错误,需下载依赖,yum install *ODBC*
    再次安装clickhouse-server没问题了。
    安装后配置路径:cd /etc/clickhouse-server/

  3. 启动命令:

    clickhouse-server --config-file=/etc/clickhouse-server/config.xml
    或:
    systemctl stop clickhouse-server
    systemctl start clickhouse-server
    

3.clickhouse配置

  1. 放开远程访问:
    vi /etc/clickhouse-server/config.xml
    修改服务器的配置文件/etc/clickhouse-server/config.xml,第65行,放开注释即可,修改之后的内容如下:

    <listen_host>::</listen_host>
    <listen_host>127.0.0.1</listen_host>
    
  2. 内存权限设置:
    vi /etc/clickhouse-server/users.xml

     <default>
        <!-- Maximum memory usage for processing single query, in bytes. -->
        <max_memory_usage>26800000000</max_memory_usage>
    
        <!-- Use cache of uncompressed blocks of data. Meaningfu
    

4.本地客户端使用clickhouse-client

  1. 启动客户端命令:
    #clickhouse-client

  2. 创建数据库:(遵循mysql创建语句)

    CREATE  DATABASE  [ IF  NOT  EXISTS ]  db_name
    
  3. 创建数据表:(末尾需要增加表引擎)

    CREATE TABLE F_SZ_RYXX (PERSON_ID String,NBXH String,NAME String,CERTYPE String,BLICTYPE String,CERNO String,HJSZD String,SEX String,AGE Decimal (18,0),RZQX Decimal (18,0),NATDATE Date,DOM String,TEL String,LITDEG String,OFFSIGN String,ACCDSIDE String,COUNTRY String,STUFFTYPE String,POSITION String,POSBRFORM String,APPOUNIT String,SJC Date,RJZB Decimal (18,0),TZE Decimal (18,0),CZFS String,SJZB Decimal (18,0),SJCZFS String,CZRQ Date,CZBL Decimal (18,0),CZF String,TZRLX String,SFBD Decimal (18,0),SFLDRKHYZM Decimal (18,0),ZXHHSWBZ String,RYLX String,FZJG String,SJQK String,BFB Decimal (18,0),UNISCID String,ZCH String,QYMC String,DJJG String,FDDBR String,CLRQ Date,QYLX String,QYSX String,JYZT String,ZCZB Decimal (18,0)) ENGINE = MergeTree(CLRQ,(PERSON_ID),10);
    
  4. 插入数据:

    1. 通过insert into 语句插入:(字符串需要单引号才能插入)

       insert into f_sz_ryxx (PERSON_ID,NBXH,NAME,CERTYPE,BLICTYPE,CERNO,HJSZD,SEX,AGE,RZQX,NATDATE,DOM,TEL,LITDEG,OFFSIGN,ACCDSIDE,COUNTRY,STUFFTYPE,POSITION,POSBRFORM,APPOUNIT,SJC,RJZB,TZE,CZFS,SJZB,SJCZFS,CZRQ,CZBL,CZF,TZRLX,SFBD,SFLDRKHYZM,ZXHHSWBZ,RYLX,FZJG,SJQK,BFB,UNISCID,ZCH,QYMC,DJJG,FDDBR,CLRQ,QYLX,QYSX,JYZT,ZCZB) values('2140000000171339','2140000000014007','王民','10','','142701570501001','','1','61','','1/5/1957 00:00:00','太原市迎泽区东安路7-3-17','13835178783','','0','','156','02','','03','','19/2/2001 00:00:00','','','','','','','','','','','','','1','','','','','1400002002070','山西皮尔复临床医药开发有限公司','1400000000','王民','1/9/1992 00:00:00','1130','03','11','300');
      
    2. 导入csv文件数据:

       cat F_SZ_RYXX.csv | clickhouse-client --query="INSERT INTO f_sz_ryxx FORMAT CSV"
      
    3. mysql数据导入clickhouse

       #du出的表大小    5.5G    article_clientuser_sum.ibd
       #ClickHouse操作语句
       CREATE TABLE article_clientuser_sum
       ENGINE = MergeTree
       ORDER BY id AS
       SELECT *
       FROM mysql('host:port', 'db', 'article_clientuser_sum', 'user', 'password') 
         #耗时和平均速度
       0 rows in set. Elapsed: 137.251 sec. Processed 18.59 million rows, 7.34 		GB (135.43 thousand rows/s., 53.48 MB/s.)
      

5.python连接clickhouse数据库操作

  1. 安装clickhouse-dricer:

    pip install clickhouse-driverpip install clickhouse-driver
    
  2. 连接clickhouse服务器:

    client = client(host='192.168.3.194',database='default',user='default',password='')
    
  3. 查询数据操作:

    client.execute('select * from F_SZ_RYXX')
    
  4. 插入数据操作:

    client.execute("INSERT INTO test2  VALUES",a,types_check=True)
    注意:#a的数据类型如:[['a','b',1,3],['a','b',2,4]]
    
  5. 案例操作(从oracle读取数据插入clickhouse):

    # -*- coding: utf-8 -*-
    import cx_Oracle
    from clickhouse_driver import Client
    import re
    import time,datetime
    import types
    
    class Clickhouse():
        # 连接oracle服务器
        dsn = cx_Oracle.makedsn("192.168.3.195","1521","topicis")
        conn = cx_Oracle.connect("topicis","topicis",dsn)
        cur = conn.cursor()
        # 连接clickhouse服务器
        client = Client(host='192.168.3.194',database='default',user='default',password='')
        # print(client.execute('select count(*) from DJ_ZT_HIST'))
    
        # 查询数据
        def select_data(self):
            start = time.time()
            print(self.client.execute('select * from F_SZ_RYXX'))
            end = time.time()
            print("查询时间:",end-start)
            
        # clickhouse创建数据表
        def create_table(self):
            self.client.execute('DROP TABLE IF EXISTS test2')
            creattable = """CREATE TABLE test2 (PERSON_ID String,NBXH String,NAME String,CERTYPE String,BLICTYPE String,CERNO String,HJSZD String,SEX String,AGE Float32,RZQX Float32,NATDATE Date,DOM String,TEL String,LITDEG String,OFFSIGN String,ACCDSIDE String,COUNTRY String,STUFFTYPE String,POSITION String,POSBRFORM String,APPOUNIT String,SJC Date,RJZB Float32,TZE Float32,CZFS String,SJZB Float32,SJCZFS String,CZRQ Date,CZBL Float32,CZF String,TZRLX String,SFBD Float32,SFLDRKHYZM Float32,ZXHHSWBZ String,RYLX String,FZJG String,SJQK String,BFB Float32,UNISCID String,ZCH String,QYMC String,DJJG String,FDDBR String,CLRQ Date,QYLX String,QYSX String,JYZT String,ZCZB Float32) ENGINE = MergeTree(CLRQ,(PERSON_ID),10);"""
            creattable1 = """CREATE TABLE test1 (ID Float32,NAME String,AGE Float32, date_time Date ) ENGINE = MergeTree(date_time,(ID),10);"""
            self.client.execute(creattable)
    
        # 插入数据
        def insert(self):
            sql = ("select * from F_SZ_RYXX")
            sql1 = ("select * from yuangong")
            run = self.cur.execute(sql)
            start = time.time()
            con1 = self.cur.fetchone()
            # print(con1)
            # con = self.cur.fetchall()
            end = time.time()
            print("mysql获取时间:",end-start)
            start1 = time.time()
    
            # for i in con:
            i = list(con1)
            i[0] = '' if i[0]==None else i[0]
            i[1] = '' if i[1]==None else i[1]
            i[2] = '' if i[2]==None else i[2]
            i[3] = '' if i[3]==None else i[3]
            # PERSON_ID = i[0]
            # NBXH = i[1]
            # NAME = i[2]
            # CERTYPE = i[3]
            i[4] = '' if i[4]==None else i[4]
            i[5] = '' if i[5]==None else i[5]
            # CERNO = i[5]
            i[6] = '' if i[6]==None else i[6]# HJSZD = i[6]
            i[7] = '' if i[7]==None else i[7]
            # SEX = i[7]
            i[8] = 0.0 if i[8]==None else i[8]
            i[9] = 0.0 if i[9]==None else i[9]
            # i[10] = datetime.datetime.now().date()if i[10]==None else i[10].date()
            i[10] = datetime.datetime.now().date()
            i[11] = '' if i[11]==None else i[11]
            i[12] = '' if i[12]==None else i[12]
            i[13] = '' if i[13]==None else i[13]
            i[14] = '' if i[14]==None else i[14]
            # DOM = i[11]
            # TEL = i[12]
            # LITDEG = i[13]
            # OFFSIGN = i[14]
            i[15] = '' if i[15]==None else i[15]
            i[16] = '' if i[16]==None else i[16]
            i[17] = '' if i[17]==None else i[17]
            i[18] = '' if i[18]==None else i[18]
            i[19] = '' if i[19]==None else i[19]
            i[20] = '' if i[20]==None else i[20]
            # POSITION = i[18]
            # POSBRFORM = i[19]
            # APPOUNIT = i[20]
            i[21] = datetime.datetime.now().date()
            i[22] = 0.0 if i[22]==None else i[22]
            i[23] = 0.0 if i[23]==None else i[23]
            # CZFS = i[24]
            i[24] = '' if i[24]==None else i[24]
            i[25] = 0.0 if i[25]==None else i[25]
            i[26] = '' if i[26]==None else i[26]
            # SJCZFS = i[26]
            i[27] = datetime.datetime.now().date()
            i[28] = 0.0 if i[28]==None else i[28]
            i[29] = '' if i[29]==None else i[29]
            i[30] = '' if i[30]==None else i[30]
            # CZF = i[29]
            # TZRLX = i[30]
            i[31] = 0.0 if i[31]==None else i[31]
            i[32] = 0.0 if i[32]==None else i[32]
            i[33] = '' if i[33]==None else i[33]
            i[34] = '' if i[34]==None else i[34]
            i[35] = '' if i[35]==None else i[35]
            i[36] = '' if i[36]==None else i[36]
            # ZXHHSWBZ = i[33]
            # RYLX = i[34]
            # FZJG = i[35]
            # SJQK = i[36]
            i[37] = 0.0 if i[37]==None else i[37]
            i[38] = '' if i[38]==None else i[38]
            i[39] = '' if i[39]==None else i[39]
            i[40] = '' if i[40]==None else i[40]
            i[41] = '' if i[41]==None else i[41]
            i[42] = '' if i[42]==None else i[42]
            # UNISCID = i[38]
            # ZCH = i[39]
            # QYMC = i[40]
            # DJJG = i[41]
            # FDDBR = i[42]
            i[43] = datetime.datetime.now().date()
            i[44] = '' if i[44]==None else i[44]
            i[45] = '' if i[45]==None else i[45]
            i[46] = '' if i[46]==None else i[46]
            # QYLX = i[44]
            # QYSX = i[45]
            # JYZT = i[46]
            i[47] = 0.0 if i[47]==None else i[47]
            # id = float(i[0])
            # name = i[1]
            # i[2] = 0.0 if i[2]==None else i[2]
            # # 时间为空时数据调整
            # i[3] = datetime.datetime.now().date()if i[3]==None else i[3].date()
            a = [i]
            print(a)
            try:
                self.client.execute("INSERT INTO test2  VALUES",a,types_check=True)
            except Exception as e:
                print(e)
            end1 = time.time()
            print('数据插入clickhouse时间为:',end1-start1)
            self.cur.close()
            self.conn.close()
    
    if __name__ == '__main__':
        c = Clickhouse()
        # c.create_table()
        # c.insert()
        c.select_data()
    

6. clickhouse-mysql安装及数据实时插入

在CentOS 7上测试

1.来自packagecloud.io的 Packagecloud repo 有关安装的更多详细信息,请访问https://github.com/Altinity/clickhouse-rpm-install

curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash

安装EPEL(for python3)和MySQL(for libmysqlclient)repos

sudo yum install -y epel-release 
sudo yum install -y https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm

如果您的回购中没有EPEL,请直接从EPEL网站安装

sudo yum install -y https://download.fedoraproject.org/pub/epel/7/x86_64/Packages/e/epel-release-7-11.noarch.rpm

从packagecloud.io安装数据阅读器

sudo yum install -y clickhouse-mysql

clickhouse包也将作为依赖项安装。

准备配置文件 - 将示例文件复制到生产环境中并进行编辑。

sudo cp /etc/clickhouse-mysql/clickhouse-mysql-example.conf /etc/clickhouse-mysql/clickhouse-mysql.conf 
sudo vim /etc/clickhouse-mysql/clickhouse-mysql.conf

2.连接mysql-clickhouse数据插入

clickhouse-mysql \
--src-server-id=1 \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=192.168.3.191 \
--src-user=root \
--src-password=abcd@1234 \
--src-tables=ZHIXIAOQIYE.zhixiao_data \
--dst-host=192.168.3.194 \
--dst-schema=default \  # 指定clickhouse数据库
--dst-table=zhixiao \   # 指定clickhouse数据表
--csvpool \
--csvpool-file-path-prefix=qwe_ \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=10000 

7.docker部署clickhouse

1.创建clickhouse-server目录, 自定义配置文件,config.xml:

<?xml version="1.0"?>
<yandex>
<logger>
    <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
    <level>trace</level>
    <log>/var/log/clickhouse-server/clickhouse-server.log</log>
    <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
    <size>1000M</size>
    <count>10</count>
    <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
</logger>
<!--display_name>production</display_name--> <!-- It is the name that will be shown in the client -->
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>

<!-- For HTTPS and SSL over native protocol. -->
<!--
<https_port>8443</https_port>
<tcp_port_secure>9440</tcp_port_secure>
-->

<!-- Used with https_port and tcp_port_secure. Full ssl options list: https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h#L71 -->
<openSSL>
    <server> <!-- Used for https server AND secure tcp port -->
        <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
        <certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
        <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>
        <!-- openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096 -->
        <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>
        <verificationMode>none</verificationMode>
        <loadDefaultCAFile>true</loadDefaultCAFile>
        <cacheSessions>true</cacheSessions>
        <disableProtocols>sslv2,sslv3</disableProtocols>
        <preferServerCiphers>true</preferServerCiphers>
    </server>

    <client> <!-- Used for connecting to https dictionary source -->
        <loadDefaultCAFile>true</loadDefaultCAFile>
        <cacheSessions>true</cacheSessions>
        <disableProtocols>sslv2,sslv3</disableProtocols>
        <preferServerCiphers>true</preferServerCiphers>
        <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
        <invalidCertificateHandler>
            <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
            <name>RejectCertificateHandler</name>
        </invalidCertificateHandler>
    </client>
</openSSL>

<!-- Default root page on http[s] server. For example load UI from https://tabix.io/ when opening http://localhost:8123 -->
<!--
<http_server_default_response><![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]></http_server_default_response>
-->

<!-- Port for communication between replicas. Used for data exchange. -->
<interserver_http_port>9009</interserver_http_port>

<!-- Hostname that is used by other replicas to request this server.
     If not specified, than it is determined analoguous to 'hostname -f' command.
     This setting could be used to switch replication to another network interface.
  -->
<!--
<interserver_http_host>example.yandex.ru</interserver_http_host>
-->

<!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->
<!--<listen_host>::</listen_host>-->
<!-- Same for hosts with disabled ipv6: -->
<!--<listen_host>192.168.3.194</listen_host>-->

<!-- Default values - try listen localhost on ipv4 and ipv6: -->
<listen_host>::</listen_host>
<listen_host>0.0.0.0</listen_host>
<!-- Don't exit if ipv6 or ipv4 unavailable, but listen_host with this protocol specified -->
<!-- <listen_try>0</listen_try> -->

<!-- Allow listen on same address:port -->
<!-- <listen_reuse_port>0</listen_reuse_port> -->

<!-- <listen_backlog>64</listen_backlog> -->

<max_connections>4096</max_connections>
<keep_alive_timeout>3</keep_alive_timeout>

<!-- Maximum number of concurrent queries. -->
<max_concurrent_queries>100</max_concurrent_queries>

<!-- Set limit on number of open files (default: maximum). This setting makes sense on Mac OS X because getrlimit() fails to retrieve
     correct maximum value. -->
<!-- <max_open_files>262144</max_open_files> -->

<!-- Size of cache of uncompressed blocks of data, used in tables of MergeTree family.
     In bytes. Cache is single for server. Memory is allocated only on demand.
     Cache is used when 'use_uncompressed_cache' user setting turned on (off by default).
     Uncompressed cache is advantageous only for very short queries and in rare cases.
  -->
<uncompressed_cache_size>8589934592</uncompressed_cache_size>

<!-- Approximate size of mark cache, used in tables of MergeTree family.
     In bytes. Cache is single for server. Memory is allocated only on demand.
     You should not lower this value.
  -->
<mark_cache_size>5368709120</mark_cache_size>


<!-- Path to data directory, with trailing slash. -->
<path>/var/lib/clickhouse/</path>

<!-- Path to temporary data for processing hard queries. -->
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>

<!-- Directory with user provided files that are accessible by 'file' table function. -->
<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

<!-- Path to configuration file with users, access rights, profiles of settings, quotas. -->
<users_config>users.xml</users_config>

<!-- Default profile of settings. -->
<default_profile>default</default_profile>

<!-- System profile of settings. This settings are used by internal processes (Buffer storage, Distibuted DDL worker and so on). -->
<!-- <system_profile>default</system_profile> -->

<!-- Default database. -->
<default_database>default</default_database>

<!-- Server time zone could be set here.

     Time zone is used when converting between String and DateTime types,
      when printing DateTime in text formats and parsing DateTime from text,
      it is used in date and time related functions, if specific time zone was not passed as an argument.

     Time zone is specified as identifier from IANA time zone database, like UTC or Africa/Abidjan.
     If not specified, system time zone at server startup is used.

     Please note, that server could display time zone alias instead of specified name.
     Example: W-SU is an alias for Europe/Moscow and Zulu is an alias for UTC.
-->
<!-- <timezone>Europe/Moscow</timezone> -->

<!-- You can specify umask here (see "man umask"). Server will apply it on startup.
     Number is always parsed as octal. Default umask is 027 (other users cannot read logs, data files, etc; group can only read).
-->
<!-- <umask>022</umask> -->

<!-- Configuration of clusters that could be used in Distributed tables.
     https://clickhouse.yandex/docs/en/table_engines/distributed/
  -->
<remote_servers incl="clickhouse_remote_servers" >
    <!-- Test only shard config for testing distributed storage -->
    <test_shard_localhost>
        <shard>
            <replica>
                <host>localhost</host>
                <port>9000</port>
            </replica>
        </shard>
    </test_shard_localhost>
    <test_shard_localhost_secure>
        <shard>
            <replica>
                <host>localhost</host>
                <port>9440</port>
                <secure>1</secure>
            </replica>
        </shard>
    </test_shard_localhost_secure>
</remote_servers>


<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
     By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
     Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
  -->

<!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.
     Optional. If you don't use replicated tables, you could omit that.

     See https://clickhouse.yandex/docs/en/table_engines/replication/
  -->
<zookeeper incl="zookeeper-servers" optional="true" />

<!-- Substitutions for parameters of replicated tables.
      Optional. If you don't use replicated tables, you could omit that.

     See https://clickhouse.yandex/docs/en/table_engines/replication/#creating-replicated-tables
  -->
<macros incl="macros" optional="true" />


<!-- Reloading interval for embedded dictionaries, in seconds. Default: 3600. -->
<builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>


<!-- Maximum session timeout, in seconds. Default: 3600. -->
<max_session_timeout>3600</max_session_timeout>

<!-- Default session timeout, in seconds. Default: 60. -->
<default_session_timeout>60</default_session_timeout>

<!-- Sending data to Graphite for monitoring. Several sections can be defined. -->
<!--
    interval - send every X second
    root_path - prefix for keys
    hostname_in_path - append hostname to root_path (default = true)
    metrics - send data from table system.metrics
    events - send data from table system.events
    asynchronous_metrics - send data from table system.asynchronous_metrics
-->
<!--
<graphite>
    <host>localhost</host>
    <port>42000</port>
    <timeout>0.1</timeout>
    <interval>60</interval>
    <root_path>one_min</root_path>
    <hostname_in_path>true</hostname_in_path>

    <metrics>true</metrics>
    <events>true</events>
    <asynchronous_metrics>true</asynchronous_metrics>
</graphite>
<graphite>
    <host>localhost</host>
    <port>42000</port>
    <timeout>0.1</timeout>
    <interval>1</interval>
    <root_path>one_sec</root_path>

    <metrics>true</metrics>
    <events>true</events>
    <asynchronous_metrics>false</asynchronous_metrics>
</graphite>
-->


<!-- Query log. Used only for queries with setting log_queries = 1. -->
<query_log>
    <!-- What table to insert data. If table is not exist, it will be created.
         When query log structure is changed after system update,
          then old table will be renamed and new table will be created automatically.
    -->
    <database>system</database>
    <table>query_log</table>
    <!--
        PARTITION BY expr https://clickhouse.yandex/docs/en/table_engines/custom_partitioning_key/
        Example:
            event_date
            toMonday(event_date)
            toYYYYMM(event_date)
            toStartOfHour(event_time)
    -->
    <partition_by>toYYYYMM(event_date)</partition_by>
    <!-- Interval of flushing data. -->
    <flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>


<!-- Uncomment if use part_log
<part_log>
    <database>system</database>
    <table>part_log</table>

    <flush_interval_milliseconds>7500</flush_interval_milliseconds>
</part_log>
-->


<!-- Parameters for embedded dictionaries, used in Yandex.Metrica.
     See https://clickhouse.yandex/docs/en/dicts/internal_dicts/
-->

<!-- Path to file with region hierarchy. -->
<!-- <path_to_regions_hierarchy_file>/opt/geo/regions_hierarchy.txt</path_to_regions_hierarchy_file> -->

<!-- Path to directory with files containing names of regions -->
<!-- <path_to_regions_names_files>/opt/geo/</path_to_regions_names_files> -->


<!-- Configuration of external dictionaries. See:
     https://clickhouse.yandex/docs/en/dicts/external_dicts/
-->
<dictionaries_config>*_dictionary.xml</dictionaries_config>

<!-- Uncomment if you want data to be compressed 30-100% better.
     Don't do that if you just started using ClickHouse.
  -->
<compression incl="clickhouse_compression">
<!--
    <!- - Set of variants. Checked in order. Last matching case wins. If nothing matches, lz4 will be used. - ->
    <case>

        <!- - Conditions. All must be satisfied. Some conditions may be omitted. - ->
        <min_part_size>10000000000</min_part_size>        <!- - Min part size in bytes. - ->
        <min_part_size_ratio>0.01</min_part_size_ratio>   <!- - Min size of part relative to whole table size. - ->

        <!- - What compression method to use. - ->
        <method>zstd</method>
    </case>
-->
</compression>

<!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster.
     Works only if ZooKeeper is enabled. Comment it if such functionality isn't required. -->
<distributed_ddl>
    <!-- Path in ZooKeeper to queue with DDL queries -->
    <path>/clickhouse/task_queue/ddl</path>

    <!-- Settings from this profile will be used to execute DDL queries -->
    <!-- <profile>default</profile> -->
</distributed_ddl>

<!-- Settings to fine tune MergeTree tables. See documentation in source code, in MergeTreeSettings.h -->
<!--
<merge_tree>
    <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
</merge_tree>
-->

<!-- Protection from accidental DROP.
     If size of a MergeTree table is greater than max_table_size_to_drop (in bytes) than table could not be dropped with any DROP query.
     If you want do delete one table and don't want to restart clickhouse-server, you could create special file <clickhouse-path>/flags/force_drop_table and make DROP once.
     By default max_table_size_to_drop is 50GB; max_table_size_to_drop=0 allows to DROP any tables.
     The same for max_partition_size_to_drop.
     Uncomment to disable protection.
-->
<!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
<!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->

<!-- Example of parameters for GraphiteMergeTree table engine -->
<graphite_rollup_example>
    <pattern>
        <regexp>click_cost</regexp>
        <function>any</function>
        <retention>
            <age>0</age>
            <precision>3600</precision>
        </retention>
        <retention>
            <age>86400</age>
            <precision>60</precision>
        </retention>
    </pattern>
    <default>
        <function>max</function>
        <retention>
            <age>0</age>
            <precision>60</precision>
        </retention>
        <retention>
            <age>3600</age>
            <precision>300</precision>
        </retention>
        <retention>
            <age>86400</age>
            <precision>3600</precision>
        </retention>
    </default>
</graphite_rollup_example>

<!-- Directory in <clickhouse-path> containing schema files for various input formats.
     The directory will be created if it doesn't exist.
  -->
<format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path>

<!-- Uncomment to disable ClickHouse internal DNS caching. -->
<!-- <disable_internal_dns_cache>1</disable_internal_dns_cache> -->
1. 自定义users.xml:
	<?xml version="1.0"?>
<yandex>
<!-- Profiles of settings. -->
<profiles>
    <!-- Default settings. -->
    <default>
        <!-- Maximum memory usage for processing single query, in bytes. -->
        <max_memory_usage>26800000000</max_memory_usage>

        <!-- Use cache of uncompressed blocks of data. Meaningful only for processing many of very short queries. -->
        <use_uncompressed_cache>0</use_uncompressed_cache>

        <!-- How to choose between replicas during distributed query processing.
             random - choose random replica from set of replicas with minimum number of errors
             nearest_hostname - from set of replicas with minimum number of errors, choose replica
              with minumum number of different symbols between replica's hostname and local hostname
              (Hamming distance).
             in_order - first live replica is choosen in specified order.
        -->
        <load_balancing>random</load_balancing>
    </default>

    <!-- Profile that allows only read queries. -->
    <readonly>
        <readonly>1</readonly>
    </readonly>
</profiles>

<!-- Users and ACL. -->
<users>
    <!-- If user name was not specified, 'default' user is used. -->
    <default>
        <!-- Password could be specified in plaintext or in SHA256 (in hex format).

             If you want to specify password in plaintext (not recommended), place it in 'password' element.
             Example: <password>qwerty</password>.
             Password could be empty.

             If you want to specify SHA256, place it in 'password_sha256_hex' element.
             Example: <password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex>

             How to generate decent password:
             Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
             In first line will be password and in second - corresponding SHA256.
        -->
        <password></password>

        <!-- List of networks with open access.

             To open access from everywhere, specify:
                <ip>::/0</ip>

             To open access only from localhost, specify:
                <ip>::1</ip>
                <ip>127.0.0.1</ip>

             Each element of list has one of the following forms:
             <ip> IP-address or network mask. Examples: 213.180.204.3 or 10.0.0.1/8 or 10.0.0.1/255.255.255.0
	     2a02:6b8::3 or 2a02:6b8::3/64 or 2a02:6b8::3/ffff:ffff:ffff:ffff::.
             <host> Hostname. Example: server01.yandex.ru.
                 To check access, DNS query is performed, and all received addresses compared to peer address.
             <host_regexp> Regular expression for host names. Example, ^server\d\d-\d\d-\d\.yandex\.ru$
                 To check access, DNS PTR query is performed for peer address and then regexp is applied.
                 Then, for result of PTR query, another DNS query is performed and all received addresses compared to peer address.
                 Strongly recommended that regexp is ends with $
             All results of DNS requests are cached till server restart.
        -->
        <networks incl="networks" replace="replace">
            <ip>::/0</ip>
        </networks>

        <!-- Settings profile for user. -->
        <profile>default</profile>

        <!-- Quota for user. -->
        <quota>default</quota>
    </default>

    <!-- Example of user with readonly access. -->
    <readonly>
        <password></password>
        <networks incl="networks" replace="replace">
            <ip>::1</ip>
            <ip>127.0.0.1</ip>
        </networks>
        <profile>readonly</profile>
        <quota>default</quota>
    </readonly>
</users>

<!-- Quotas. -->
<quotas>
    <!-- Name of quota. -->
    <default>
        <!-- Limits for time interval. You could specify many intervals with different limits. -->
        <interval>
            <!-- Length of interval. -->
            <duration>3600</duration>

            <!-- No limits. Just calculate resource usage for time interval. -->
            <queries>0</queries>
            <errors>0</errors>
            <result_rows>0</result_rows>
            <read_rows>0</read_rows>
            <execution_time>0</execution_time>
        </interval>
    </default>
</quotas>
2. 创建docker-compose.yml文件:
version: '3'

services:
    clickhouse-server:
            image: yandex/clickhouse-server
            container_name: clickhouse-server
            hostname: clickhouse-server
            ports:
                    - 8124:8123
                    - 9001:9000
            expose:
                    - 9000
                    - 9009
            volumes:
                    - /home/app/clickhouse-server/config.xml:/etc/clickhouse-server/config.xml
                    - /home/app/clickhouse-server/users.xml:/etc/clickhouse-server/users.xml
                    - /home/app/clickhouse-server/data:/var/lib/clickhouse
                    - /home/app/clickhouse-server/log/clickhouse-server.log:/var/log/clickhouse-server/clickhouse-server.log
                    - /home/app/clickhouse-server/log/clickhouse-server.err.log:/var/log/clickhouse-server/clickhouse-server.err.log

4.启动clickhouse-server:

	docker-compose up -d 
  1. 连接客户端测试:

    clickhouse-client -h 192.168.3.194 --port 9001

Logo

更多推荐