1. 背景

在测试环境中是手动用docker部署的postgres(简称:pg),采用默认的配置pg运行正常,在将pg转移到k8s中之后,发现由于k8s集群节点中有节点有hugepages-2Mi的资源配置,因此pg默认会开启hugepage,但是我的部署yaml文件中并没有给pg分配hugepage的资源,从而导致pg运行失败。

2. 问题复现查看节点资源

[root@t34 hugepage]# kubectl describe node t32

...

Capacity:

cpu: 56

ephemeral-storage: 120965376Ki

hugepages-1Gi: 0

hugepages-2Mi: 5000Mi

memory: 263847940Ki

pods: 110

Allocatable:

cpu: 56

ephemeral-storage: 117675117681

hugepages-1Gi: 0

hugepages-2Mi: 5000Mi

memory: 258420740Ki

pods: 110

...

节点t32具有5000Mi的hugepages-2Mi的资源为pg不分配hugepage资源,报错信息如下:

[root@t34 hugepage]# kubectl logs -f pg-deploy-54cd8b8bb8-7zs4v

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".

The default database encoding has accordingly been set to "UTF8".

The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok

creating subdirectories ... ok

selecting dynamic shared memory implementation ... posix

selecting default max_connections ... 20

selecting default shared_buffers ... 400kB

selecting default time zone ... Etc/UTC

creating configuration files ... ok

Bus error (core dumped)

child process exited with exit code 135

initdb: removing contents of data directory "/var/lib/postgresql/data"

3. 解决方案

3.1 方案一将hugepages关闭

echo "vm.nr_hugepages=0" >> /etc/sysctl.conf

sysctl -p

将所有的节点hugepage关闭,pg中参数huge_page=try会默认不选择hugepage,因此能够运行成功。查看

root@t34 hugepage]# kubectl get pod -o wide -l app=postgres

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

pg-deploy-54cd8b8bb8-mh9q5 1/1 Running 0 58s 10.42.1.151 t32

3.2 方案二修改配置文件中的huge_page=try为off,重新打包postgres,Dockerfile如下

FROM postgres:12.3

COPY postgresql.conf.sample /usr/share/postgresql/postgresql.conf.sample

3.3 方案三开启节点hugepages

echo "vm.nr_hugepages=2500" >> /etc/sysctl.conf

sysctl -p部署yaml文件添加对hugepages-2Mi的相关配置

apiVersion: apps/v1

kind: Deployment

metadata:

name: pg-deploy

spec:

replicas: 1

selector:

matchLabels:

app: postgres

template:

metadata:

labels:

app: postgres

spec:

containers:

- name: postgres

image: hub.geovis.io/airstudio/postgres:12.3

imagePullPolicy: Always

ports:

- containerPort: 5432

env:

- name: POSTGRES_PASSWORD

value: "123456"

volumeMounts:

- name: data-dir

mountPath: /var/lib/postgresql/data/

- name: hugepage

mountPath: /hugepages

resources:

limits:

hugepages-2Mi: 1000Mi

cpu: 4

memory: 2Gi

volumes:

- name: data-dir

hostPath:

path: /mnt/mfs/pg_data

type: DirectoryOrCreate

- name: hugepage

emptyDir:

medium: HugePages

需要注意以下2点:resources配置hugepages-2Mi以外必须添加对cpu和memory的限制,否则会报错

以上yaml文件适合于k8s版本在1.14-1.15之间的版本,其他版本配置格式不同,请参加官方文档查看

[root@t32 ~]# cat /proc/meminfo |grep Huge

AnonHugePages: 1959936 kB

HugePages_Total: 2500

HugePages_Free: 2421

HugePages_Rsvd: 256

HugePages_Surp: 0

Hugepagesize: 2048 kB

[root@t34 hugepage]# kubectl get pod -o wide -l app=postgres

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

pg-deploy-fbdfbf8f9-zrgvh 1/1 Running 0 45s 10.42.1.153 t32

[root@t34 hugepage]# kubectl exec -it pg-deploy-fbdfbf8f9-zrgvh ls /

bin docker-entrypoint-initdb.d home lib64 opt runsys var

boot docker-entrypoint.sh hugepages media proc sbintmp

dev etc lib mnt root srvusr

4. 总结

linux下分为三类地址:逻辑地址、线性地址、物理地址,简单讲就是逻辑地址通过分段机制映射转换为线性地址,线性地址通过分页机制映射转换为物理地址。

线性地址与物理地址之间的对应关系需要采用TLB(Translation Lookaside Buffer)来缓存,Linux默认pagetable大小为4k.

在大内存管理过程中hugepages相比4k的pagetable管理效率更高,以下举例说明:

数据库100G的数据读入内存,连接数据库的进程数为1000,一条PTE(page table entrt)为8B,则:pagetable=4k

PTE大小:100x1024x1024k/4kx8B=200M

1000进程大小:100x1024x1024k/4kx8Bx1000=200Ghugepages=2m

PTE大小:100x1024m/2mx8B=400k

1000进程大小:100x1024m/2mx8Bx1000=400m

对比一下,可以看出在大内存需求下,hugepages浪费的资源更少。

btw:在数据库索引下,hubepages能够保存更多的索引数据,减少页表切换,提高查询检索性能。

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐