docker geovis_关于Postgres的HugePage在k8s中的解决方案
1. 背景在测试环境中是手动用docker部署的postgres(简称:pg),采用默认的配置pg运行正常,在将pg转移到k8s中之后,发现由于k8s集群节点中有节点有hugepages-2Mi的资源配置,因此pg默认会开启hugepage,但是我的部署yaml文件中并没有给pg分配hugepage的资源,从而导致pg运行失败。2. 问题复现查看节点资源[root@t34 hugepage]# k
1. 背景
在测试环境中是手动用docker部署的postgres(简称:pg),采用默认的配置pg运行正常,在将pg转移到k8s中之后,发现由于k8s集群节点中有节点有hugepages-2Mi的资源配置,因此pg默认会开启hugepage,但是我的部署yaml文件中并没有给pg分配hugepage的资源,从而导致pg运行失败。
2. 问题复现查看节点资源
[root@t34 hugepage]# kubectl describe node t32
...
Capacity:
cpu: 56
ephemeral-storage: 120965376Ki
hugepages-1Gi: 0
hugepages-2Mi: 5000Mi
memory: 263847940Ki
pods: 110
Allocatable:
cpu: 56
ephemeral-storage: 117675117681
hugepages-1Gi: 0
hugepages-2Mi: 5000Mi
memory: 258420740Ki
pods: 110
...
节点t32具有5000Mi的hugepages-2Mi的资源为pg不分配hugepage资源,报错信息如下:
[root@t34 hugepage]# kubectl logs -f pg-deploy-54cd8b8bb8-7zs4v
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
Bus error (core dumped)
child process exited with exit code 135
initdb: removing contents of data directory "/var/lib/postgresql/data"
3. 解决方案
3.1 方案一将hugepages关闭
echo "vm.nr_hugepages=0" >> /etc/sysctl.conf
sysctl -p
将所有的节点hugepage关闭,pg中参数huge_page=try会默认不选择hugepage,因此能够运行成功。查看
root@t34 hugepage]# kubectl get pod -o wide -l app=postgres
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pg-deploy-54cd8b8bb8-mh9q5 1/1 Running 0 58s 10.42.1.151 t32
3.2 方案二修改配置文件中的huge_page=try为off,重新打包postgres,Dockerfile如下
FROM postgres:12.3
COPY postgresql.conf.sample /usr/share/postgresql/postgresql.conf.sample
3.3 方案三开启节点hugepages
echo "vm.nr_hugepages=2500" >> /etc/sysctl.conf
sysctl -p部署yaml文件添加对hugepages-2Mi的相关配置
apiVersion: apps/v1
kind: Deployment
metadata:
name: pg-deploy
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: hub.geovis.io/airstudio/postgres:12.3
imagePullPolicy: Always
ports:
- containerPort: 5432
env:
- name: POSTGRES_PASSWORD
value: "123456"
volumeMounts:
- name: data-dir
mountPath: /var/lib/postgresql/data/
- name: hugepage
mountPath: /hugepages
resources:
limits:
hugepages-2Mi: 1000Mi
cpu: 4
memory: 2Gi
volumes:
- name: data-dir
hostPath:
path: /mnt/mfs/pg_data
type: DirectoryOrCreate
- name: hugepage
emptyDir:
medium: HugePages
需要注意以下2点:resources配置hugepages-2Mi以外必须添加对cpu和memory的限制,否则会报错
以上yaml文件适合于k8s版本在1.14-1.15之间的版本,其他版本配置格式不同,请参加官方文档查看
[root@t32 ~]# cat /proc/meminfo |grep Huge
AnonHugePages: 1959936 kB
HugePages_Total: 2500
HugePages_Free: 2421
HugePages_Rsvd: 256
HugePages_Surp: 0
Hugepagesize: 2048 kB
[root@t34 hugepage]# kubectl get pod -o wide -l app=postgres
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pg-deploy-fbdfbf8f9-zrgvh 1/1 Running 0 45s 10.42.1.153 t32
[root@t34 hugepage]# kubectl exec -it pg-deploy-fbdfbf8f9-zrgvh ls /
bin docker-entrypoint-initdb.d home lib64 opt runsys var
boot docker-entrypoint.sh hugepages media proc sbintmp
dev etc lib mnt root srvusr
4. 总结
linux下分为三类地址:逻辑地址、线性地址、物理地址,简单讲就是逻辑地址通过分段机制映射转换为线性地址,线性地址通过分页机制映射转换为物理地址。
线性地址与物理地址之间的对应关系需要采用TLB(Translation Lookaside Buffer)来缓存,Linux默认pagetable大小为4k.
在大内存管理过程中hugepages相比4k的pagetable管理效率更高,以下举例说明:
数据库100G的数据读入内存,连接数据库的进程数为1000,一条PTE(page table entrt)为8B,则:pagetable=4k
PTE大小:100x1024x1024k/4kx8B=200M
1000进程大小:100x1024x1024k/4kx8Bx1000=200Ghugepages=2m
PTE大小:100x1024m/2mx8B=400k
1000进程大小:100x1024m/2mx8Bx1000=400m
对比一下,可以看出在大内存需求下,hugepages浪费的资源更少。
btw:在数据库索引下,hubepages能够保存更多的索引数据,减少页表切换,提高查询检索性能。
更多推荐
所有评论(0)