Kubernetes是一个开源的,用于管理云平台中多个主机上的容器化的应用,Kubernetes的目标是让部署容器化的应用简单并且高效(powerful),Kubernetes提供了应用部署,规划,更新,维护的一种机制。

准备
docker:k8s底层是基于docker的,所以你需要先安装docker.
禁掉swap分区:你可以用sudo swapoff -a,要永久禁用swap分区的话,需要sudo vim /etc/fstab,注释掉swap那一行
翻墙工具Shadowsocks:因为后面依赖的一些资源,docker镜像是放在google平台上的,所以要翻墙

设置http代理
一般开ss,其实是设置了一个socks5代理,所以我们还需要一个http转socks5的工具,这里用的是privoxy.
先安装privoxy

sudo apt-get install privoxy
配置Privoxy, 打开 /etc/privoxy/config,在最后一行后边加上

forward-socks5 / 127.0.0.1:1080 .
listen-address 127.0.0.1:8008
这里的意思是把请求全部映射到本地1080端口上,privoxy监听在8008端口.
然后重启Privoxy

sudo service privoxy restart
然后你就可以用

export http_proxy=http://127.0.0.1:8008
export https_proxy=http://127.0.0.1:8008
来访问国外资源,可以测试一下,curl https://google.com,配置正确的话,会输出

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
安装kubeadm
安装k8s其实有好几种方法,因为是单机部署,用官方做的工具kubeadm来安装更加简单快速.
接下来几步我们会使用proxychains,它可以让我们在终端直接使用socks5代理,这里用了proxychains-ng(新一代proxychains)

git clone https://github.com/rofl0r/proxychains-ng.git
cd proxychains-ng
./configure --prefix=/usr --sysconfdir=/etc
$ make
$ make install
$ make install-config (安装proxychains.conf配置文件)
使用的话,在需要代理的命令前加上proxychains4 ,如:

proxychains4 wget https://google.com
接下的一步是下载并添加Kubernetes安装的密钥。

proxychains4 curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add
配置kubernetes源

sudo touch /etc/apt/sources.list.d/kubernetes.list 
sudo echo deb http://apt.kubernetes.io/ kubernetes-xenial main >> /etc/apt/sources.list.d/kubernetes.list 
安装kubeadm和kubelet等依赖

proxychains4 apt-get update
proxychains4 apt-get install -y kubelet kubeadm kubectl kubernetes-cni
kubeadm init初始化集群
打开终端,我们先要设置http代理,这里用proxychains4没什么用

export http_proxy=http://127.0.0.1:8008
export https_proxy=http://127.0.0.1:8008
export no_proxy=192.168.1.118 # 你电脑的ip地址
还需要做的是给docker设置代理,因为镜像在google平台上,注意,有2种代理,一种是docker client的,一种是docker server的,不要搞混了。这里设置的是docker server的(因为pull镜像是docker server执行的),这里代理就是上述的privoxy地址:

#为docker service创建一个systemd drop-in 目录
mkdir -p /etc/systemd/system/docker.service.d

#使用下面内容创建文件/etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://127.0.0.1:8008/"

#使用下面内容创建文件/etc/systemd/system/docker.service.d/https-proxy.conf
[Service]
Environment="HTTPS_PROXY=http://127.0.0.1:8008/"

#写入改动
sudo systemctl daemon-reload

#重启docker服务
sudo systemctl restart docker
执行kubeadm init, kubeadm init的时候要先想好使用Pod的哪个网络插件,这里选择的是Calico插件

kubeadm init --pod-network-cidr=172.16.0.0/16
整个过程看日志的话,可以使用

journalctl -xeu kubelet
可能一次执行不会成功,设置正确之后,你可以再执行kubeadm init的话,可以加参数忽略所有前置检查错误

kubeadm init --pod-network-cidr=172.16.0.0/16 --ignore-preflight-errors=all
正确初始化,会看到字样

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
......
按照提示,执行命令

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
查看所有pods,使用kubectl get pods --all-namespaces

NAMESPACE     NAME                                   READY     STATUS    RESTARTS   AGE
kube-system   coredns-78fcdf6894-5h7tl               0/1       Pending   0          1h
kube-system   coredns-78fcdf6894-z7vcj               0/1       Pending   0          1h
kube-system   etcd-salamanderpc                      1/1       Running   0          1h
kube-system   kube-apiserver-salamanderpc            1/1       Running   1          1h
kube-system   kube-controller-manager-salamanderpc   1/1       Running   1          1h
kube-system   kube-proxy-brgdx                       1/1       Running   0          1h
kube-system   kube-scheduler-salamanderpc            1/1       Running   1          1h
发现coredns还是pedding,这个没关系,我们还需要安装Pod Network插件,这里安装的是Calico
首先,安装etcd实例

kubectl apply -f \
https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/hosted/etcd.yaml
输出

daemonset "calico-etcd" created
service "calico-etcd" created
安装calico的RBAC

kubectl apply -f \
https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/rbac.yaml
输出

clusterrole.rbac.authorization.k8s.io "calico-kube-controllers" created
clusterrolebinding.rbac.authorization.k8s.io "calico-kube-controllers" created
clusterrole.rbac.authorization.k8s.io "calico-node" created
clusterrolebinding.rbac.authorization.k8s.io "calico-node" created
kubectl apply -f \
https://docs.projectcalico.org/v3.2/getting-started/kubernetes/installation/hosted/calico.yaml
输出

configmap "calico-config" created
secret "calico-etcd-secrets" created
daemonset.extensions "calico-node" created
serviceaccount "calico-node" created
deployment.extensions "calico-kube-controllers" created
serviceaccount "calico-kube-controllers" created
等待所有pod变成running

watch kubectl get pods --all-namespaces
需要等待一定时间(1,2分钟)

NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE
kube-system   calico-etcd-l9zrs                          1/1       Running   0          1m
kube-system   calico-kube-controllers-65945f849d-kpndn   1/1       Running   0          1m
kube-system   calico-node-5bb4d                          2/2       Running   0          1m
kube-system   coredns-78fcdf6894-5pjcn                   1/1       Running   0          3m
kube-system   coredns-78fcdf6894-f5wtd                   1/1       Running   0          3m
kube-system   etcd-salamanderpc                          1/1       Running   0          2m
kube-system   kube-apiserver-salamanderpc                1/1       Running   0          2m
kube-system   kube-controller-manager-salamanderpc       1/1       Running   0          2m
kube-system   kube-proxy-f6kxr                           1/1       Running   0          3m
kube-system   kube-scheduler-salamanderpc                1/1       Running   0          2m
部署服务
因为是单节点,本来是需要加入worker节点去运行真正的服务的,但为了测试,我们可以

kubectl taint nodes --all node-role.kubernetes.io/master-
脱离限制(线上是不能这么做的)

我们新建一个Deployment文件,内容为

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 2 # tells deployment to run 2 pods matching the template
  template: # create pods using pod definition in this template
    metadata:
      # unlike pod-nginx.yaml, the name is not included in the meta data as a unique name is
      # generated from the deployment name
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.0
        ports:
        - containerPort: 80
Deployment是新一代用于Pod管理的对象,与Replication Controller相比,它提供了更加完善的功能,使用起来更加简单方便。
然后,创建Deployment

kubectl create -f nginx_deployment.yaml
上面会创建两种pods,容器开放端口80

查看Deployment

kubectl get deployment
查看创建的pods(有两个)

kubectl get pods
显示

NAME                                READY     STATUS    RESTARTS   AGE
nginx-deployment-67594d6bf6-bwnlz   1/1       Running   0          39m
nginx-deployment-67594d6bf6-frrdx   1/1       Running   0          39m
为了能够对外访问,我们需要定义service

kind: Service
apiVersion: v1
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 9898
      targetPort: 80
创建service

kubectl create -f nginx-service.yaml
上面的service对外暴露端口为9898
查看创建的service

kubectl get svc
显示

NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes      ClusterIP   10.96.0.1       <none>        443/TCP    3h
nginx-service   ClusterIP   10.101.10.236   <none>        9898/TCP   23m


FFDL

项目地址:https://github.com/IBM/FfDL
安装helm
helm

helm init
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'

kubectl config set-context $(kubectl config current-context) --namespace=ivdai

export VM_TYPE=none
export PUBLIC_IP=<Cluster Public IP>
export NAMESPACE=default

ls

create nfs for pv

# Create the shared directory
sudo mkdir -p /data-nfs

# Install NFS kernel server
sudo apt update
sudo apt install -y nfs-kernel-server

# Update /etc/exports
sudo echo "/data-nfs *(rw,no_root_squash,no_subtree_check)" | sudo tee -a /etc/exports

# Restart NFS kernel server
sudo service nfs-kernel-server restart

test_pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0001
  labels: 
    type: dlaas-static-volume
spec:
  capacity: 
    storage: 200Gi
  accessModes:
    - ReadWriteMany
  nfs:
    path: /data-nfs 
    server: 192.168.8.110

kubectl create -f test_pv.yaml

helm install .
kubectl config set-context $(kubectl config current-context) --namespace=$NAMESPACE
kubectl get pods
# NAME                                 READY     STATUS    RESTARTS   AGE
# alertmanager-7cf6b988b9-h9q6q        1/1       Running   0          5h
# etcd0                                1/1       Running   0          5h
# ffdl-lcm-65bc97bcfd-qqkfc            1/1       Running   0          5h
# ffdl-restapi-8777444f6-7jfcf         1/1       Running   0          5h
# ffdl-trainer-768d7d6b9-4k8ql         1/1       Running   0          5h
# ffdl-trainingdata-866c8f48f5-ng27z   1/1       Running   0          5h
# ffdl-ui-5bf86cc7f5-zsqv5             1/1       Running   0          5h
# mongo-0                              1/1       Running   0          5h
# prometheus-5f85fd7695-6dpt8          2/2       Running   0          5h
# pushgateway-7dd8f7c86d-gzr2g         2/2       Running   0          5h
# storage-0                            1/1       Running   0          5h

node_ip=$PUBLIC_IP
grafana_port=$(kubectl get service grafana -o jsonpath='{.spec.ports[0].nodePort}')
ui_port=$(kubectl get service ffdl-ui -o jsonpath='{.spec.ports[0].nodePort}')
restapi_port=$(kubectl get service ffdl-restapi -o jsonpath='{.spec.ports[0].nodePort}')
s3_port=$(kubectl get service minio -o jsonpath='{.spec.ports[0].nodePort}')

echo "Monitoring dashboard: http://$node_ip:$grafana_port/ (login: admin/admin)"
echo "Web UI: http://$node_ip:$ui_port/#/login?endpoint=$node_ip:$restapi_port&username=test-user"

Using FfDL Local S3 Based Object Storage

node_ip=$PUBLIC_IP
s3_port=$(kubectl get service minio -o jsonpath='{.spec.ports[0].nodePort}')
s3_url=http://$node_ip:$s3_port

export AWS_ACCESS_KEY_ID=admin; export AWS_SECRET_ACCESS_KEY=password; export AWS_DEFAULT_REGION=us-east-1;

s3cmd="aws --endpoint-url=$s3_url s3"
$s3cmd mb s3://trainingdata
$s3cmd mb s3://trainedmodel
$s3cmd mb s3://mnist_lmdb_data
$s3cmd mb s3://dlaas-trained-models

mkdir tmp
for file in t10k-images-idx3-ubyte.gz t10k-labels-idx1-ubyte.gz train-images-idx3-ubyte.gz train-labels-idx1-ubyte.gz;
do
  test -e tmp/$file || wget -q -O tmp/$file http://yann.lecun.com/exdb/mnist/$file
  $s3cmd cp tmp/$file s3://trainingdata/$file
done

restapi_port=$(kubectl get service ffdl-restapi -o jsonpath='{.spec.ports[0].nodePort}')
export DLAAS_URL=http://$node_ip:$restapi_port; export DLAAS_USERNAME=test-user; export DLAAS_PASSWORD=test;

if [ "$(uname)" = "Darwin" ]; then
  sed -i '' s/s3.default.svc.cluster.local/$node_ip:$s3_port/ etc/examples/tf-model/manifest.yml
else
  sed -i s/s3.default.svc.cluster.local/$node_ip:$s3_port/ etc/examples/tf-model/manifest.yml
fi

CLI_CMD=$(pwd)/cli/bin/ffdl-$(if [ "$(uname)" = "Darwin" ]; then echo 'osx'; else echo 'linux'; fi)
$CLI_CMD train etc/examples/tf-model/manifest.yml etc/examples/tf-model

Using Cloud Object Storage

export AWS_ACCESS_KEY_ID=mos
export AWS_SECRET_ACCESS_KEY=mos
s3_url=http://120.79.11.211:8080
s3cmd="aws --endpoint-url=$s3_url s3"

trainingDataBucket=<unique bucket name for training data storage>
trainingResultBucket=<unique bucket name for training result storage>

$s3cmd mb s3://$trainingDataBucket
$s3cmd mb s3://$trainingResultBucket

mkdir tmp
for file in t10k-images-idx3-ubyte.gz t10k-labels-idx1-ubyte.gz train-images-idx3-ubyte.gz train-labels-idx1-ubyte.gz;
do
  test -e tmp/$file || wget -q -O tmp/$file http://yann.lecun.com/exdb/mnist/$file
  $s3cmd cp tmp/$file s3://$trainingDataBucket/$file
done

if [ "$(uname)" = "Darwin" ]; then
  sed -i '' s/tf_training_data/$trainingDataBucket/ etc/examples/tf-model/manifest.yml
  sed -i '' s/tf_trained_model/$trainingResultBucket/ etc/examples/tf-model/manifest.yml
  sed -i '' s/s3.default.svc.cluster.local/$node_ip:$s3_port/ etc/examples/tf-model/manifest.yml
  sed -i '' s/user_name: test/user_name: $AWS_ACCESS_KEY_ID/ etc/examples/tf-model/manifest.yml
  sed -i '' s/password: test/password: $AWS_SECRET_ACCESS_KEY/ etc/examples/tf-model/manifest.yml
else
  sed -i s/tf_training_data/$trainingDataBucket/ etc/examples/tf-model/manifest.yml
  sed -i s/tf_trained_model/$trainingResultBucket/ etc/examples/tf-model/manifest.yml
  sed -i s/s3.default.svc.cluster.local/$node_ip:$s3_port/ etc/examples/tf-model/manifest.yml
  sed -i s/user_name: test/user_name: $AWS_ACCESS_KEY_ID/ etc/examples/tf-model/manifest.yml
  sed -i s/password: test/password: $AWS_SECRET_ACCESS_KEY/ etc/examples/tf-model/manifest.yml
fi

restapi_port=$(kubectl get service ffdl-restapi -o jsonpath='{.spec.ports[0].nodePort}')
export DLAAS_URL=http://$node_ip:$restapi_port; export DLAAS_USERNAME=test-user; export DLAAS_PASSWORD=test;

# Obtain the correct CLI for your machine and run the training job with our default TensorFlow model
CLI_CMD=cli/bin/ffdl-$(if [ "$(uname)" = "Darwin" ]; then echo 'osx'; else echo 'linux'; fi)
$CLI_CMD train etc/examples/tf-model/manifest.yml etc/examples/tf-model


docker build -q -t docker.io/ffdl/ffdl-restapi:user-lyh .
(cd ./restapi/ && (test ! -e main.go || CGO_ENABLED=0 GOOS=linux go build -ldflags "-s -w" -a -installsuffix cgo -o bin/main))


kubectl set image deploy ffdl-restapi ffdl-restapi-container=ffdl/ffdl-restapi:v0.1.1
 

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐