在Kubernetes上部署Hive

思路:

  • 上一篇文章部署的Hadoop为基础,共享Hadoop集群的配置文件,安装Hadoop但不启动任何Hadoop进程
  • 启动容器时进行Metadata数据库初始化,启动hiveserver2和metastore

1、环境介绍

[root@master-0 ~]# kubectl get nodes -o wide
NAME       STATUS    ROLES     AGE       VERSION           EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
master-0   Ready     master    14d       v1.9.2+coreos.0   <none>        CentOS Linux 7 (Core)   3.10.0-862.el7.x86_64   docker://1.13.1
worker-0   Ready     <none>    14d       v1.9.2+coreos.0   <none>        CentOS Linux 7 (Core)   3.10.0-862.el7.x86_64   docker://1.13.1
worker-1   Ready     <none>    14d       v1.9.2+coreos.0   <none>        CentOS Linux 7 (Core)   3.10.0-862.el7.x86_64   docker://1.13.1
[root@master-0 ~]# kubectl get svc -o wide
NAME                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                          AGE       SELECTOR
hadoop-dn-service            ClusterIP   None            <none>        9000/TCP,50010/TCP,50075/TCP     17h       app=hadoop-dn
hadoop-nn-service            ClusterIP   None            <none>        9000/TCP,50070/TCP               17h       app=hadoop-nn
hadoop-ui-service            NodePort    10.233.21.71    <none>        8088:32295/TCP,50070:31127/TCP   17h       app=hadoop-nn

2、构建镜像

Hive没有官方镜像,这里我基于Centos 7.5和Hive 2.3.3制作了自己的镜像,Dockerfile如下:

FROM 192.168.101.88:5000/base/centos:7.5.1804
MAINTAINER leichen.china@gmail.com

ADD CentOS-Base.repo /etc/yum.repos.d
ADD jdk-7u80-linux-x64.tar.gz /opt
ADD hadoop-2.9.1.tar.gz /opt
ADD apache-hive-2.3.3-bin.tar.gz /opt

RUN yum install -y which &&  mv /opt/apache-hive-2.3.3-bin /opt/apache-hive-2.3.3

ADD mysql-connector-java-5.1.46.jar /opt/apache-hive-2.3.3/lib

ENV JAVA_HOME /opt/jdk1.7.0_80
ENV HADOOP_HOME /opt/hadoop-2.9.1
ENV HIVE_HOME /opt/apache-hive-2.3.3
ENV PATH $JAVA_HOME/bin:$PATH

脚本:docker build -t 192.168.101.88:5000/dmcop2/hive:dm-2.3.3 .

说明:使用阿里YUM源替换容器中默认的YUM源

3、部署Hive

3.1、部署Mysql

apiVersion: v1
kind: Secret
metadata:
  name: hive-metadata-mysql-secret
  labels:
    app: hive-metadata-mysql
type: Opaque
data:
  mysql-root-password: RGFtZW5nQDc3Nw==
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: hive-metadata-mysql
  name: hive-metadata-mysql
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: hive-metadata-mysql
  template:
    metadata:
      labels:
        app: hive-metadata-mysql
    spec:
      initContainers:
        - name: remove-lost-found
          image: 192.168.101.88:5000/k8s1.9/busybox:1.29.2
          imagePullPolicy: IfNotPresent
          command: ["rm", "-rf", "/var/lib/mysql/lost+found"]
          volumeMounts:
            - name: data
              mountPath: /var/lib/mysql
      containers:
        - name: mysql
          image: 192.168.101.88:5000/dmcop2/mysql:5.7
          volumeMounts:
            - name: data
              mountPath: /var/lib/mysql
          ports:
            - containerPort: 3306
              protocol: TCP
          env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: hive-metadata-mysql-secret
                  key: mysql-root-password
      volumes:
        - name: data
          emptyDir: {}
---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: hive-metadata-mysql
  name: hive-metadata-mysql-service
spec:
  ports:
    - name: tcp
      port: 3306
      targetPort: 3306
  selector:
    app: hive-metadata-mysql
  type: NodePort

3.2、部署Hive

  • 启动脚本和配置文件
apiVersion: v1
kind: ConfigMap
metadata:
  name: hive-custom-config-cm
  labels:
    app: hive
data:
  bootstrap.sh: |-
    #!/bin/bash

    cd /root/bootstrap

    # Apply custom config file context
    for cfg in ./*; do
      if [[ ! "$cfg" =~ bootstrap.sh ]]; then
        cat $cfg > $HIVE_HOME/conf/${cfg##*/}
      fi
    done

    # Replace hive metadata password
    sed -i 's/${HIVE_METADATA_PASSWORD}/'$HIVE_METADATA_PASSWORD'/g' `grep '${HIVE_METADATA_PASSWORD}' -rl $HIVE_HOME/conf`

    # initSchema
    if [[ ! -e $HADOOP_CONF_DIR/hive-metastore-initialization.out ]]; then
      $HADOOP_HOME/bin/hadoop fs -mkdir -p /tmp
      $HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
      $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
      $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

      $HIVE_HOME/bin/schematool -dbType mysql -initSchema --verbose &> $HADOOP_CONF_DIR/hive-metastore-initialization.out
    fi

    $HIVE_HOME/bin/hiveserver2 &
    $HIVE_HOME/bin/hive --service metastore &

    cp $HIVE_HOME/conf/hive-env.sh.template $HIVE_HOME/conf/hive-env.sh && echo "export HADOOP_CLIENT_OPTS=\"-Xmx512m -XX:MaxPermSize=1024m \$HADOOP_CLIENT_OPTS\"" >> $HIVE_HOME/conf/hive-env.sh

    # keep running
    sleep infinity
  hive-site.xml: |-
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>${HIVE_METADATA_PASSWORD}</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hive-metadata-mysql-service:3306/metastore?createDatabaseIfNotExist=true&amp;useSSL=false</value>
      </property>
      <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
      </property>
      <property>
        <name>system:java.io.tmpdir</name>
        <value>/tmp</value>
      </property>
      <property>
        <name>system:user.name</name>
        <value>hive</value>
      </property>
      <property>
        <name>hive.server2.authentication</name>
        <value>NOSASL</value>
      </property>
    </configuration>

说明:

1、容器中会挂载Hadoop的配置目录,共享Hadoop集群的配置文件

2、Hive在第一次启动时需要初始化元数据库,与HDFS类似,将初始化命令输出内容保存到共享目录,作为是否执行过初始化操作的判断依据

3、元数据库的密码通过容器环境变量设置,容器启动时执行bootstrap.sh,将环境变量替换到配置文件中

  • 部署Hive
apiVersion: v1
kind: ConfigMap
metadata:
  name: hive-metastore-database
  labels:
    app: hive
data:
  execute.sql: |-
    -- create database
    CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
    -- create user and grant authorization
    GRANT ALL ON metastore.* TO 'hive'@'%' IDENTIFIED BY '${IDENTIFIED}';
---
apiVersion: v1
kind: Secret
metadata:
  name: hive-metastore-secret
  labels:
    app: hive
type: Opaque
data:
  database-dba-password: RGFtZW5nQDc3Nw==
  database-user-password: RGFtZW5nQDc3Nw==
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hive
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: hive
  template:
    metadata:
      labels:
        app: hive
    spec:
      initContainers:
        - name: init-dababase
          image:  192.168.101.88:5000/dmcop2/database-tools:1.0-SNAPSHOT
          env:
            - name: DRIVER_NAME
              value: "com.mysql.jdbc.Driver"
            - name: URL
              value: "jdbc:mysql://hive-metadata-mysql-service:3306/mysql?useUnicode=true&characterEncoding=utf8&useSSL=false"
            - name: USERNAME
              value: "root"
            - name: PASSWORD
              valueFrom:
                secretKeyRef:
                  name: hive-metastore-secret
                  key: database-dba-password
            - name: IDENTIFIED
              valueFrom:
                secretKeyRef:
                  name: hive-metastore-secret
                  key: database-user-password
          volumeMounts:
            - name: init-dababase-volume
              mountPath: /root/db_tools/script
      containers:
        - name: hive
          image: 192.168.101.88:5000/dmcop2/hive:dm-2.3.3
          command: ["bash", "-c", "chmod +x /root/bootstrap/bootstrap.sh && /root/bootstrap/bootstrap.sh"]
          ports:
            - containerPort: 10000
            - containerPort: 10002
            - containerPort: 9083
          env:
            - name: HADOOP_CONF_DIR
              value: /etc/hadoop
            - name: HIVE_METADATA_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: hive-metastore-secret
                  key: database-user-password
          volumeMounts:
            - name: hadoop-config-volume
              mountPath: /etc/hadoop
            - name: hive-custom-config-volume
              mountPath: /root/bootstrap
          readinessProbe:
            initialDelaySeconds: 20
            periodSeconds: 5
            tcpSocket:
              port: 10000
      volumes:
        - name: hadoop-config-volume
          persistentVolumeClaim:
            claimName: hadoop-config-nfs-pvc
        - name: hive-custom-config-volume
          configMap:
            name: hive-custom-config-cm
        - name: init-dababase-volume
          configMap:
            name: hive-metastore-database
---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: hive
  name: hive-service
spec:
  ports:
    - port: 10000
      targetPort: 10000
      name: thrift
    - port: 10002
      targetPort: 10002
      name: webui
    - port: 9083
      targetPort: 9083
      name: metastore
  selector:
    app: hive
  type: NodePort

说明:

1、使用前面部署到MySQL作为MetedataStore,使用Secret设置MySQL的root密码

2、使用initContainer连接MySQL数据库,创建数据库、用户并赋权限,脚本使用ConfigMap挂载

database-tools是我自己创建的容器,功能非常简单,就是创建JDBC连接,执行SQL脚本,用起来还是很方便的~

3、挂载Hadoop集群的共享配置文件,确保Hive可以正常的连接到HDFS

4、使用Service NodePort对外提供访问

4、测试Hive

  • 访问Web UI
[root@master-0 kubernetes]# kubectl get svc -o wide
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                          AGE       SELECTOR
hadoop-dn-service             ClusterIP   None            <none>        9000/TCP,50010/TCP,50075/TCP                     18h       app=hadoop-dn
hadoop-nn-service             ClusterIP   None            <none>        9000/TCP,50070/TCP                               18h       app=hadoop-nn
hadoop-ui-service             NodePort    10.233.21.71    <none>        8088:32295/TCP,50070:31127/TCP                   18h       app=hadoop-nn
hive-metadata-mysql-service   NodePort    10.233.23.56    <none>        3306:31470/TCP                                   1m        app=hive-metadata-mysql
hive-service                  NodePort    10.233.60.239   <none>        10000:30717/TCP,10002:30001/TCP,9083:32335/TCP   40s       app=hive
kubernetes                    ClusterIP   10.233.0.1      <none>        443/TCP                                          14d       <none>

这里写图片描述

  • 到容器内部执行CRUD操作
[root@master-0 ~]# kubectl get pods
NAME                                   READY     STATUS    RESTARTS   AGE
hadoop-dn-0                            1/1       Running   0          18h
hadoop-dn-1                            1/1       Running   0          18h
hadoop-dn-2                            1/1       Running   0          18h
hadoop-dn-3                            1/1       Running   0          16h
hadoop-nn-0                            1/1       Running   0          18h
hive-5985d6485b-vtkm4                  1/1       Running   0          3m
hive-metadata-mysql-8577d98f98-pwpjc   1/1       Running   0          4m
[root@master-0 ~]# kubectl exec hive-5985d6485b-vtkm4 -ti bash
[root@hive-5985d6485b-vtkm4 /]# cd /opt/apache-hive-2.3.3/bin
[root@hive-5985d6485b-vtkm4 bin]# ./hive shell
which: no hbase in (/opt/jdk1.7.0_80/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.3/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive> show tables;
OK
Time taken: 4.435 seconds
hive> create table abc (a int);
OK
Time taken: 0.571 seconds
hive> insert into abc values (1);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = root_20180907015605_f62e6cfe-594a-45e7-ae92-7a942d40bee3
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1536218868594_0003, Tracking URL = http://hadoop-nn-0.hadoop-nn-service.default.svc.cluster.local:8088/proxy/application_1536218868594_0003/
Kill Command = /opt/hadoop-2.9.1/bin/hadoop job  -kill job_1536218868594_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-09-07 01:56:16,745 Stage-1 map = 0%,  reduce = 0%
2018-09-07 01:56:23,004 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.72 sec
MapReduce Total cumulative CPU time: 1 seconds 720 msec
Ended Job = job_1536218868594_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hadoop-nn-0.hadoop-nn-service.default.svc.cluster.local:9000/user/hive/warehouse/abc/.hive-staging_hive_2018-09-07_01-56-05_553_2053353633784500171-1/-ext-10000
Loading data to table default.abc
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.72 sec   HDFS Read: 4278 HDFS Write: 69 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 720 msec
OK
Time taken: 20.196 seconds
hive> select * from abc;
OK
1
Time taken: 0.182 seconds, Fetched: 2 row(s)
hive> drop table abc;
OK
Time taken: 1.713 seconds
hive> show tables;
OK
values__tmp__table__1
Time taken: 0.027 seconds, Fetched: 1 row(s)

5、注意事项

  • 重新部署时,执行确认是否要执行 initSchema ,执行前需要删除NFS上Hadoop配置文件目录下的hive-metastore-initialization.out文件
  • 执行 hive shell 时可能会报错:java.lang.OutOfMemoryError: PermGen space,我在初始配置下,执行insert时一直提示此错误,所以在bootstrap.sh中设置hive-env.sh的HADOOP_CLIENT_OPTS变量

6、参考资料

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐