初次尝试在kubernetes 1.31 上安装人工智能模型运行平台 llm-d

备注：按照官方文档，排除官方文档不清楚的步骤，安装到最后一步了，只缺 HF_TOKEN了,因为我的kubernetes集群无法访问HF.

云道轩

1019人浏览 · 2025-09-26 09:22:53

云道轩 · 2025-09-26 09:22:53 发布

备注：

按照官方文档，排除官方文档不清楚的步骤，安装到最后一步了，只缺 HF_TOKEN了,因为我的kubernetes集群无法访问HF.

[root@bastion quickstart]# cat /etc/redhat-release

Rocky Linux release 9.5 (Blue Onyx)

[root@bastion quickstart]#

[root@bastion quickstart]# kubectl get nodes

NAME STATUS ROLES AGE VERSION

master01.kcloudonline.com Ready control-plane 46h v1.31.0

worker01.kcloudonline.com Ready <none> 46h v1.31.0

worker02.kcloudonline.com Ready <none> 46h v1.31.0

worker03.kcloudonline.com Ready <none> 46h v1.31.0

[root@bastion quickstart]#

获取安装代码/介质（Get the code）

Clone the llm-d-deployer repository.

git clone https://github.com/llm-d/llm-d-deployer.git

Navigate to the quickstart directory

cd llm-d-deployer/quickstart

[root@bastion software]# dnf install git -y

[root@bastion software]# mkdir llm-d

[root@bastion software]# cd llm-d/

[root@bastion llm-d]# git clone https://github.com/llm-d/llm-d-deployer.git

[root@bastion llm-d]# cd llm-d-deployer/

[root@bastion llm-d-deployer]# ls

chart-dependencies CONTRIBUTING.md ct-install.yaml DCO LICENSE Makefile OWNERS README.md

charts cr.yaml ct.yaml helpers lintconf.yaml notes quickstart REPO_DOCS.md

[root@bastion llm-d-deployer]# cd quickstart/

[root@bastion quickstart]# ls

examples grafana grafana-setup.md infra install-deps.sh llmd-installer.sh metrics-overview.md README.md README-minikube.md test-request.sh

[root@bastion quickstart]#

要求的工具（Required tools）

Following prerequisite are required for the installer to work.

yq (mikefarah) – installation

jq – download & install guide

git – installation guide

Helm – quick-start install

Kustomize – official install docs

kubectl – install & setup

You can use the installer script that installs all the required dependencies.

./install-deps.sh

# 下载并安装yq

sudo wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/local/bin/yq

# 赋予执行权限

sudo chmod +x /usr/local/bin/yq

# 验证安装

yq –version

使用官方脚本安装（推荐）

# 下载并安装最新版本的Kustomize

curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash

# 将kustomize移动到系统PATH中

sudo mv kustomize /usr/local/bin/

# 验证安装

kustomize version

[root@bastion quickstart]# sudo wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/local/bin/yq

Resolving release-assets.githubusercontent.com (release-assets.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...

Connecting to release-assets.githubusercontent.com (release-assets.githubusercontent.com)|185.199.110.133|:443... connected.

HTTP request sent, awaiting response... 200 OK

Length: 11477176 (11M) [application/octet-stream]

Saving to: ‘/usr/local/bin/yq’

/usr/local/bin/yq 100%[=====================================================================================>] 10.95M 1002KB/s in 7.1s

2025-09-26 08:34:22 (1.55 MB/s) - ‘/usr/local/bin/yq’ saved [11477176/11477176]

[root@bastion quickstart]# sudo chmod +x /usr/local/bin/yq

[root@bastion quickstart]# yq --version

yq (https://github.com/mikefarah/yq/) version v4.47.2

[root@bastion quickstart]#

[root@bastion llm-d]# curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash

v5.7.1

kustomize installed to /software/llm-d/kustomize

[root@bastion llm-d]# ls

kustomize llm-d-deployer

[root@bastion llm-d]# cp kustomize /usr/local/bin/

[root@bastion llm-d]# kustomize version

v5.7.1

[root@bastion llm-d]#

[root@bastion quickstart]# ./install-deps.sh

Rocky Linux 9 - BaseOS 2.5 kB/s | 4.1 kB 00:01

Rocky Linux 9 - AppStream 5.0 kB/s | 4.5 kB 00:00

Rocky Linux 9 - Extras 631 B/s | 2.9 kB 00:04

Dependencies resolved.

========================================================================================================================================================================= Package Architecture Version Repository Size

=========================================================================================================================================================================Installing:

make x86_64 1:4.3-8.el9 baseos 529 k

Transaction Summary

=========================================================================================================================================================================Install 1 Package

Total download size: 529 k

Installed size: 1.6 M

Downloading Packages:

make-4.3-8.el9.x86_64.rpm 301 kB/s | 529 kB 00:01

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------Total 212 kB/s | 529 kB 00:02

Running transaction check

Transaction check succeeded.

Running transaction test

Transaction test succeeded.

Running transaction

Preparing : 1/1

Installing : make-1:4.3-8.el9.x86_64 1/1

Running scriptlet: make-1:4.3-8.el9.x86_64 1/1

Verifying : make-1:4.3-8.el9.x86_64 1/1

Installed:

make-1:4.3-8.el9.x86_64

Complete!

Installing yq...

[root@bastion quickstart]#

要求的凭证和配置（Required credentials and configuration）

llm-d-deployer GitHub repo – clone here（https://github.com/llm-d/llm-d-deployer.git）

HuggingFace HF_TOKEN （https://huggingface.co/docs/hub/en/security-tokens） with download access for the model you want to use. By default the sample application will use meta-llama/Llama-3.2-3B-Instruct.

⚠️ Your Hugging Face account must have access to the model you want to use. You may need to visit Hugging Face meta-llama/Llama-3.2-3B-Instruct and accept the usage terms if you have not already done so.

目标平台（Target Platforms）

Since the llm-d-deployer is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms.

安装llm-d (llm-d Installation)

Only a single installation of llm-d on a cluster is currently supported. In the future, multiple model services will be supported. Until then, uninstall llm-d before reinstalling.

The llm-d-deployer contains all the helm charts necessary to deploy llm-d. To facilitate the installation of the helm charts, the llmd-installer.sh script is provided. This script will populate the necessary manifests in the manifests directory. After this, it will apply all the manifests in order to bring up the cluster.

The llmd-installer.sh script aims to simplify the installation of llm-d using the llm-d-deployer as it's main function. It scripts as many of the steps as possible to make the installation process more streamlined. This includes:

Installing the GAIE infrastructure

Creating the namespace with any special configurations

Creating the pull secret to download the images

Creating the model service CRDs

Applying the helm charts

Deploying the sample app (model service)

It also supports uninstalling the llm-d infrastructure and the sample app.

Before proceeding with the installation, ensure you have completed the prerequisites and are able to issue kubectl or oc commands to your cluster by configuring your ~/.kube/config file or by using the oc login command.

Usage

The installer needs to be run from the llm-d-deployer/quickstart directory as a cluster admin with CLI access to the cluster.

./llmd-installer.sh [OPTIONS]

Flags

案例（Examples）

在Kubernetes 安装（Install llm-d on an Existing Kubernetes Cluster）

export HF_TOKEN="your-token"

./llmd-installer.sh

[root@bastion quickstart]# ./llmd-installer.sh

📂 Setting up script environment...

kubectl can reach to a running Kubernetes cluster.

❌ HF_TOKEN not set; Run: export HF_TOKEN=<your_token>

[root@bastion quickstart]#

备注：

llm-d的安装和模型没有分离，这个设计我觉得有点问题。按照我的理解，安装好了再上载模型可能更好。

在OpenShift上安装（Install on OpenShift )

Before running the installer, ensure you have logged into the cluster as a cluster administrator. For example:

oc login --token=sha256~yourtoken --server=https://api.yourcluster.com:6443

export HF_TOKEN="your-token"

./llmd-installer.sh

Validation

The inference-gateway serves as the HTTP ingress point for all inference requests in our deployment. It’s implemented as a Kubernetes Gateway (gateway.networking.k8s.io/v1) using either kgateway or istio as the gatewayClassName, and sits in front of your inference pods to handle path-based routing, load balancing, retries, and metrics. This example validates that the gateway itself is routing your completion requests correctly. You can execute the test-request.sh script to test on the cluster.

# Default options (the model id will be discovered via /v1/models)

./test-request.sh

# Non-default namespace/model

./test-request.sh -n <NAMESPACE> -m <FULL_MODEL_NAME> --minikube

If you receive an error indicating PodSecurity "restricted" violations when running the smoke-test script, you need to remove the restrictive PodSecurity labels from the namespace. Once these labels are removed, re-run the script and it should proceed without PodSecurity errors. Run the following command:

kubectl label namespace <NAMESPACE> \

pod-security.kubernetes.io/warn- \

pod-security.kubernetes.io/warn-version- \

pod-security.kubernetes.io/audit- \

pod-security.kubernetes.io/audit-version-