VLLMService Operator 开发第六篇：给模型服务增加 Service 自动创建能力

superDBL

574人浏览 · 2026-06-29 21:12:47

superDBL · 2026-06-29 21:12:47 发布

前言

上一篇文章完成了 VLLMService Operator 的部署验证。当时我们的 Operator 已经能够根据 VLLMService 自动创建 Deployment，并拉起 vLLM 模型服务 Pod。验证时使用的是 Pod 级别的 port-forward：

kubectl -n ai-demo port-forward pod/qwen-demo-78f5568f6b-rqghg 8888:8000

然后通过本地接口访问：

curl http://127.0.0.1:8888/v1/models

这个方式可以证明 vLLM Pod 已经正常启动，也可以证明模型服务本身是可用的，但它不适合作为长期访问入口。原因很简单：Pod 是临时资源，Pod 名称和 Pod IP 都可能随着重启、滚动更新、重新调度发生变化。比如当前 Pod 名字可能是：

qwen-demo-78f5568f6b-rqghg

下一次重建后就可能变成另一个名字。如果后续还想接入 Gateway API、HTTPRoute 或者其他业务服务，就不能直接依赖 Pod 名称。

所以这一篇开始给 VLLMService Operator 增加 Service 自动创建能力。目标是让用户只创建一个 VLLMService，Operator 不仅能自动创建 Deployment，还能自动创建 Service，为后端 Pod 提供稳定的集群内访问入口。

这一篇先只做 Service，暂时不讲 HTTPRoute。HTTPRoute 会放到后面的文章中继续实现。

GitHub项目地址：https://github.com/bolin-dai/vllmservice-operator

一、为什么要增加 Service

Deployment 负责管理 Pod，但 Deployment 本身不是一个网络访问入口。真正承载模型推理服务的是 Pod，而 Pod 具有临时性：Pod 可能因为重启、升级、节点故障、滚动更新等原因被删除并重新创建。这会带来两个问题：第一，Pod 名称会变化；第二，Pod IP 也会变化。如果直接访问 Pod，就会导致访问入口不稳定。

Service 的作用就是为一组 Pod 提供一个稳定的访问入口。Service 会根据 selector 找到符合条件的后端 Pod，然后把访问 Service 的流量转发到这些 Pod 上。

对于当前的 VLLMService 来说，我们希望 Operator 自动生成一个类似下面这样的 Service：

apiVersion: v1
kind: Service
metadata:
  name: qwen-demo
  namespace: ai-demo
spec:
  type: ClusterIP
  selector:
    app.kubernetes.io/name: vllmservice
    app.kubernetes.io/instance: qwen-demo
  ports:
    - name: http
      protocol: TCP
      port: 8000
      targetPort: http

这个 Service 的含义是：在 ai-demo 命名空间创建一个名为 qwen-demo 的 ClusterIP Service，对外暴露 8000 端口，并通过 selector 选择属于 qwen-demo 这个 VLLMService 实例的 Pod。后续访问 Service/qwen-demo:8000，就可以转发到后端 vLLM Pod 的 http 端口。

这样一来，访问链路就从原来的：

直接访问某一个 Pod

变成了：

访问稳定的 Service
  -> Service 根据 selector 找到后端 Pod
  -> 转发到 vLLM 容器端口

这也是后续接入 HTTPRoute 的基础，因为 HTTPRoute 的 backendRef 通常应该指向 Service，而不是直接指向 Pod。

二、当前代码新增了哪些能力

当前这一版代码主要新增了四部分能力：

1. 给 Operator 增加 services 的 RBAC 权限；
2. 在 Reconcile 中使用 CreateOrUpdate 创建或更新 Service；
3. 给 Service 设置 OwnerReference，让它归 VLLMService 管理；
4. 在 SetupWithManager 中增加 Owns(&corev1.Service{})，让 Service 变化也能触发 Reconcile。

也就是说，现在 VLLMService Operator 管理的子资源已经从单一的 Deployment 扩展成：

Deployment
Service

当前的整体调谐逻辑可以理解为：

读取 VLLMService
  -> 创建或更新 Deployment
  -> 创建或更新 Service
  -> 更新 VLLMService status
  -> 返回 Reconcile 结果

这一篇重点讲 Service 这部分。

三、增加 Service 的 RBAC 权限

Operator 要创建、更新、删除 Service，必须先具备 Service 的 API 权限。Service 属于 Kubernetes core API group，所以 RBAC marker 中的 groups 应该写空字符串。

在 controller 文件中增加如下 marker：

// +kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete

这里的含义是：

groups=""       表示 core API group；
resources=services 表示授权 Service 资源；
verbs=...       表示允许 get、list、watch、create、update、patch、delete 等操作。

修改 RBAC marker 后，需要重新生成 manifests：

make manifests

生成完成后，可以查看 config/rbac/role.yaml 中是否已经包含 services 权限：

grep -n "services" config/rbac/role.yaml

正常情况下，生成出来的 ClusterRole 里应该包含类似内容：

- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch

这一步很重要。如果忘记增加 Service 权限，Operator 运行后在 Reconcile 中创建 Service 时就会因为 RBAC 权限不足而失败，日志里一般会出现类似 forbidden 的错误。

四、在 Reconcile 中创建 Service

在上一版代码中，Reconcile 只负责创建或更新 Deployment。现在需要在 Deployment 同步完成后，再同步 Service。代码结构大致如下：

service := &corev1.Service{
	ObjectMeta: metav1.ObjectMeta{
		Name:      vllmService.Name,
		Namespace: vllmService.Namespace,
	},
}

serviceOperation, err := controllerutil.CreateOrUpdate(ctx, r.Client, service, func() error {
	service.Labels = labelsForVLLMService(vllmService)

	service.Spec.Type = corev1.ServiceTypeClusterIP
	service.Spec.Selector = selectorLabelsForVLLMService(vllmService.Name)
	service.Spec.Ports = []corev1.ServicePort{
		{
			Name:       "http",
			Protocol:   corev1.ProtocolTCP,
			Port:       portFor(vllmService),
			TargetPort: intstr.FromString("http"),
		},
	}

	return controllerutil.SetControllerReference(vllmService, service, r.Scheme)
})

这里仍然使用 controllerutil.CreateOrUpdate，原因和 Deployment 一样：如果 Service 不存在，就创建；如果 Service 已经存在，就把它更新成期望状态；如果 Service 没有变化，就保持不变。需要注意的是，传给 CreateOrUpdate 的 Service 对象只提前设置了 Name 和 Namespace：

service := &corev1.Service{
	ObjectMeta: metav1.ObjectMeta{
		Name:      vllmService.Name,
		Namespace: vllmService.Namespace,
	},
}

具体的 labels、type、selector、ports、OwnerReference 都放在 MutateFn 里设置。这种写法比较符合 controller-runtime 中 CreateOrUpdate 的使用习惯。

当前代码里 Service 的名称和 VLLMService 保持一致：

Name:      vllmService.Name,
Namespace: vllmService.Namespace,

如果用户创建的 VLLMService 是：

apiVersion: aiinfra.example.com/v1alpha1
kind: VLLMService
metadata:
  name: qwen-demo
  namespace: ai-demo

那么 Operator 创建出来的 Service 就是：

namespace: ai-demo
name: qwen-demo

这样设计有两个好处：第一，资源关系清晰，看到 qwen-demo 这个 Service 就能知道它属于 qwen-demo 这个 VLLMService；第二，后续 HTTPRoute 指向后端服务时，可以直接使用 VLLMService 的名字作为 Service 名称。例如后面写 HTTPRoute 时，backendRef 就可以指向：

backendRefs:
  - name: qwen-demo
    port: 8000

这个设计对学习阶段来说比较直观。

五、Service labels 和 selector 的设计

这里是本文最关键的地方。Service 里有两个容易混淆的字段：

metadata.labels
spec.selector

metadata.labels 是 Service 这个对象自己的标签，用于查询、分类和管理；spec.selector 是 Service 用来选择后端 Pod 的条件。当前 Operator 中有两个 label 函数：

selectorLabels := selectorLabelsForVLLMService(vllmService.Name)
objectLabels := labelsForVLLMService(vllmService)

selectorLabelsForVLLMService 只生成最核心、最稳定的选择标签：

func selectorLabelsForVLLMService(name string) map[string]string {
	return map[string]string{
		"app.kubernetes.io/name":     "vllmservice",
		"app.kubernetes.io/instance": name,
	}
}

这两个标签的含义是：

app.kubernetes.io/name=vllmservice
  表示这是 vllmservice 这类应用；

app.kubernetes.io/instance=<VLLMService 名称>
  表示这是某一个具体 VLLMService 实例创建出来的资源。

例如 qwen-demo 这个 VLLMService 创建出来的 Deployment、Pod、Service 都会带有：

app.kubernetes.io/name=vllmservice
app.kubernetes.io/instance=qwen-demo

Service 的 selector 就使用这两个稳定标签：

service.Spec.Selector = selectorLabelsForVLLMService(vllmService.Name)

对应到 YAML 中就是：

spec:
  selector:
    app.kubernetes.io/name: vllmservice
    app.kubernetes.io/instance: qwen-demo

这样 Service 就能找到由 qwen-demo 这个 VLLMService 创建出来的 Pod。

VLLMService 里允许用户写自定义 labels：

spec:
  labels:
    aiinfra.example.com/model: qwen2.5
    aiinfra.example.com/runtime: vllm
    aiinfra.example.com/team: infra
    aiinfra.example.com/scheduler: volcano

这些 labels 适合放到对象自己的 metadata.labels 上，方便查询和管理，例如：

kubectl -n ai-demo get svc -l aiinfra.example.com/team=infra
kubectl -n ai-demo get pod -l aiinfra.example.com/model=qwen2.5

但是这些用户自定义 labels 不适合放进 Service selector。原因是 selector 中的多个条件是逻辑 AND，必须全部满足才算匹配。如果把太多用户自定义 labels 放进 Service selector，后续只要某个 label 被修改、删除或者 PodTemplate 没有同步上，Service 后端就可能变空，流量就转发不过去。所以当前设计是：

Service.metadata.labels：
  使用完整 labels，包括 Operator 固定 labels 和用户自定义 labels；

Service.spec.selector：
  只使用稳定 selectorLabels，不放用户可能变更的 labels。

对应代码就是：

service.Labels = labelsForVLLMService(vllmService)
service.Spec.Selector = selectorLabelsForVLLMService(vllmService.Name)

当前代码里比较合理的一点：对象标签可以丰富，选择器标签必须稳定。

六、Service 端口和 targetPort 的设计

当前代码中 Service 端口这样设置：

service.Spec.Ports = []corev1.ServicePort{
	{
		Name:       "http",
		Protocol:   corev1.ProtocolTCP,
		Port:       portFor(vllmService),
		TargetPort: intstr.FromString("http"),
	},
}

这里有几个点需要拆开看。

第一，Name: "http" 表示 Service port 的名字是 http。

第二，Protocol: corev1.ProtocolTCP 表示使用 TCP 协议。

第三，Port: portFor(vllmService) 表示 Service 暴露的端口来自 VLLMService 的 spec.port。如果用户没有设置 port，则默认使用 8000：

func portFor(vllmservice *aiinfrav1alpha1.VLLMService) int32 {
	if vllmservice.Spec.Port == 0 {
		return 8000
	}

	return vllmservice.Spec.Port
}

第四，TargetPort: intstr.FromString("http") 表示 Service 转发到后端 Pod 中名为 http 的 containerPort，而不是直接写死数字端口。

这要求 PodTemplate 中的容器端口必须有名字，并且名字也叫 http：

Ports: []corev1.ContainerPort{
	{
		Name:          "http",
		ContainerPort: port,
		Protocol:      corev1.ProtocolTCP,
	},
}

这样设计的好处是，Service 不直接绑定一个硬编码的数字端口，而是引用容器端口名称。只要容器端口名称保持为 http，Service 的 targetPort 就能找到正确的后端端口。

对应生成出来的 Service 大概是：

spec:
  type: ClusterIP
  ports:
    - name: http
      protocol: TCP
      port: 8000
      targetPort: http

这里要注意一个坑：如果以后把容器端口名从 http 改成了别的名字，但 Service 的 targetPort 仍然写 http，流量就可能无法正确转发。因此，Service 的 targetPort 名称必须和 containerPort 的 Name 保持一致。

当前代码中 Service 类型设置为：

service.Spec.Type = corev1.ServiceTypeClusterIP

也就是：

spec:
  type: ClusterIP

ClusterIP 是 Kubernetes Service 的默认类型，它主要用于集群内部访问。对于当前阶段来说，选择 ClusterIP 是合理的，因为我们现在只是要给模型服务提供一个稳定的集群内入口，还没有开始做对外暴露。当前访问方式可以这样理解：

集群内其他 Pod
  -> qwen-demo.ai-demo.svc.cluster.local:8000
  -> Service/qwen-demo
  -> 后端 vLLM Pod

而开发验证时，可以临时使用：

kubectl -n ai-demo port-forward svc/qwen-demo 8888:8000

这样本地访问 127.0.0.1:8888，就能转发到 Service 的 8000 端口，再由 Service 转发到后端 Pod。

七、给 Service 设置 OwnerReference

Service 和 Deployment 一样，都是 VLLMService 创建出来的子资源，所以也应该设置 OwnerReference：

return controllerutil.SetControllerReference(vllmService, service, r.Scheme)

这行代码的含义是：把当前 VLLMService 设置为 Service 的 Controller Owner。设置成功后，Service 的 YAML 中会出现类似：

metadata:
  ownerReferences:
    - apiVersion: aiinfra.example.com/v1alpha1
      kind: VLLMService
      name: qwen-demo
      controller: true

这样做至少有两个好处。第一，支持级联删除。用户删除 VLLMService 后，Kubernetes 垃圾回收机制可以自动删除它拥有的 Service。

kubectl -n ai-demo delete vllmservice qwen-demo

删除 VLLMService 后，对应的 Deployment 和 Service 都应该被清理掉。

第二，配合 Owns(&corev1.Service{}) 可以实现自愈。如果用户手动修改了 Service，controller-runtime 可以根据 OwnerReference 找到它属于哪个 VLLMService，然后重新触发对应 VLLMService 的 Reconcile，Operator 会再把 Service 调回期望状态。

八、在 SetupWithManager 中监听 Service

前面只创建 Service 还不够，还需要让 controller-runtime 知道 Service 是 VLLMService 管理的子资源。当前代码中已经在 SetupWithManager 里增加了：

func (r *VLLMServiceReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&aiinfrav1alpha1.VLLMService{}).
		Owns(&appsv1.Deployment{}).
		Owns(&corev1.Service{}).
		Named("vllmservice").
		Complete(r)
}

这里的核心是：

Owns(&corev1.Service{})

它表示当前 Controller 不仅关注 VLLMService 本身，也关注由 VLLMService 拥有的 Service。当这些 Service 发生变化时，controller-runtime 可以重新入队对应的 VLLMService，然后触发 Reconcile。这样就形成了一个自愈闭环：

用户手动修改 Service
  -> Service 发生变化
  -> controller-runtime 根据 OwnerReference 找到 VLLMService
  -> 重新触发 Reconcile
  -> Operator 把 Service 改回期望状态

这就是 Operator 和普通脚本的区别。脚本通常只执行一次，而 Operator 会持续观察和修正实际状态。

九、把 Service 信息写入 VLLMService Status

当前代码里，updateVLLMServiceStatus 方法也增加了 Service 参数：

func (r *VLLMServiceReconciler) updateVLLMServiceStatus(
	ctx context.Context,
	vllmservice *aiinfrav1alpha1.VLLMService,
	deployment *appsv1.Deployment,
	service *corev1.Service,
) error {
	phase, message := phaseAndMessageFromDeployment(deployment)

	serviceName := ""
	if service != nil {
		serviceName = service.Name
	}

	if vllmservice.Status.Phase == phase &&
		vllmservice.Status.ReadyReplicas == deployment.Status.ReadyReplicas &&
		vllmservice.Status.DeploymentName == deployment.Name &&
		vllmservice.Status.ServiceName == serviceName &&
		vllmservice.Status.Message == message {
		return nil
	}

	vllmservice.Status.Phase = phase
	vllmservice.Status.ReadyReplicas = deployment.Status.ReadyReplicas
	vllmservice.Status.DeploymentName = deployment.Name
	vllmservice.Status.ServiceName = serviceName
	vllmservice.Status.Message = message

	return r.Status().Update(ctx, vllmservice)
}

这段代码的作用是把 Service 名称写回 VLLMService status：

vllmservice.Status.ServiceName = serviceName

对应的 API 类型里也有字段：

// +optional
ServiceName string `json:"serviceName,omitempty"`

这样用户查看 VLLMService 时，就能看到这个 VLLMService 当前关联的 Service 名称。后面如果继续扩展 HTTPRoute，也可以继续往 status 里增加 RouteName、GatewayName、Endpoint 等字段。

十、重新生成、构建和部署 Operator

代码修改完成后，先格式化 controller 文件：

gofmt -w internal/controller/vllmservice_controller.go

由于增加了 Service 的 RBAC marker，需要重新生成 manifests：

make manifests

如果 API 类型也有调整，例如新增了 status.serviceName 字段，则还需要执行：

make generate
make manifests

然后执行编译检查：

make build

确认代码可以正常编译后，重新构建 Operator 镜像：

make docker-build IMG=registry.cn-hangzhou.aliyuncs.com/docker-test-dai/vllmservice-operator:v0.2

推送镜像：

docker push registry.cn-hangzhou.aliyuncs.com/docker-test-dai/vllmservice-operator:v0.2

重新部署 Operator：

make deploy IMG=registry.cn-hangzhou.aliyuncs.com/docker-test-dai/vllmservice-operator:v0.2

部署完成后，查看 Operator Pod：

kubectl -n vllmservice-operator-system get pod

正常情况下，Controller Manager 应该处于 Running 状态：

NAME                                                       READY   STATUS    RESTARTS   AGE
vllmservice-operator-controller-manager-xxxxxxx-xxxxx      1/1     Running   0          1m

如果 Operator 没有正常启动，优先查看日志：

kubectl -n vllmservice-operator-system logs deploy/vllmservice-operator-controller-manager

十一、创建 VLLMService 进行验证

测试使用的 VLLMService 还是之前的 qwen-demo：

apiVersion: aiinfra.example.com/v1alpha1
kind: VLLMService
metadata:
  name: qwen-demo
  namespace: ai-demo
spec:
  image: docker.m.daocloud.io/vllm/vllm-openai:latest

  modelPath: /data/models/Qwen2.5-1.5B-Instruct
  modelName: qwen2.5-1.5b-instruct
  replicas: 1

  schedulerName: volcano

  nodeSelector:
    kubernetes.io/hostname: master-01

  labels:
    aiinfra.example.com/model: qwen2.5
    aiinfra.example.com/runtime: vllm
    aiinfra.example.com/team: infra
    aiinfra.example.com/scheduler: volcano

  port: 8000

  resources:
    requests:
      cpu: "2"
      memory: 8Gi
      volcano.sh/vgpu-number: "1"
      volcano.sh/vgpu-memory: "6144"
      volcano.sh/vgpu-cores: "50"
    limits:
      cpu: "4"
      memory: 16Gi
      volcano.sh/vgpu-number: "1"
      volcano.sh/vgpu-memory: "6144"
      volcano.sh/vgpu-cores: "50"

  storage:
    pvcName: qwen-model-pvc
    mountPath: /data/models
    readOnly: true

应用yaml文件后，查看 VLLMService：

kubectl -n ai-demo get vllmservice

然后查看 Deployment 和 Pod 是否正常：

kubectl -n ai-demo get deploy
kubectl -n ai-demo get pod

示例 Pod 输出：

NAME                         READY   STATUS    RESTARTS   AGE
qwen-demo-78f5568f6b-xjrdj   1/1     Running   0          117s

看到 Pod 是 Running 后，再验证 Service 是否已经被 Operator 自动创建。查看 Service：

kubectl -n ai-demo get service

示例输出：

NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
qwen-demo   ClusterIP   10.103.104.205   <none>        8000/TCP   14s

这说明 Operator 已经根据 VLLMService 自动创建了一个名为 qwen-demo 的 ClusterIP Service，并暴露了 8000 端口。可以进一步查看 Service 详情：

kubectl -n ai-demo describe service qwen-demo

重点看三部分。第一，看 Service 类型：

Type: ClusterIP

第二，看 Service selector：

Selector: app.kubernetes.io/instance=qwen-demo,app.kubernetes.io/name=vllmservice

第三，看端口映射：

Port:       http  8000/TCP
TargetPort: http/TCP

如果 selector 正确，Service 就能找到后端 Pod。如果 selector 写错，Service 虽然存在，但后端 endpoints 可能为空。可以通过下面命令验证 endpoints：

kubectl -n ai-demo get endpoints qwen-demo

或者在较新的 Kubernetes 版本中查看 EndpointSlice：

kubectl -n ai-demo get endpointslice -l kubernetes.io/service-name=qwen-demo

如果 Service 正确匹配到了后端 Pod，就应该能看到对应的 Pod IP 和端口。

十二、通过 Service port-forward 访问模型服务

现在不再对 Pod 做 port-forward，而是对 Service 做 port-forward：

kubectl -n ai-demo port-forward svc/qwen-demo 8888:8000

这条命令的含义是：

本机 127.0.0.1:8888
  -> Service/qwen-demo:8000
  -> 后端 vLLM Pod 的 http 端口

新开一个终端，访问模型列表接口：

curl http://127.0.0.1:8888/v1/models

示例返回：

{
  "object": "list",
  "data": [
    {
      "id": "qwen2.5-1.5b-instruct",
      "object": "model",
      "owned_by": "vllm",
      "root": "/data/models/Qwen2.5-1.5B-Instruct",
      "parent": null,
      "max_model_len": 4096
    }
  ]
}

这里能看到：

id: qwen2.5-1.5b-instruct
owned_by: vllm
root: /data/models/Qwen2.5-1.5B-Instruct

这说明请求已经能够通过 Service 转发到后端 vLLM Pod，模型服务访问正常。除了 /v1/models，还可以继续验证 /v1/chat/completions：

curl http://127.0.0.1:8888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-1.5b-instruct",
    "messages": [
      {"role": "user", "content": "你是谁？"}
    ]
  }'

示例返回中可以看到模型正常生成了回答：

{
  "id": "chatcmpl-9c7287cf5b96e1b2",
  "object": "chat.completion",
  "model": "qwen2.5-1.5b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "我是Qwen，由阿里云开发的超大规模语言模型。我是一个AI助手，可以回答问题、创作文字、提供信息等。如果您有任何问题或需要帮助，请随时告诉我，我会尽力为您提供支持和解答。"
      },
      "finish_reason": "stop"
    }
  ]
}

这一步比 /v1/models 更进一步。/v1/models 只能证明 API 能访问、模型已加载；/v1/chat/completions 能证明模型推理链路也可以正常工作。此时访问链路已经变成：

curl 127.0.0.1:8888
  -> kubectl port-forward
  -> Service/qwen-demo:8000
  -> 后端 vLLM Pod
  -> vLLM OpenAI-compatible API

十三、验证 OwnerReference 和级联删除

Service 是由 VLLMService 管理的子资源，所以应该带有 OwnerReference。可以查看 Service YAML：

kubectl -n ai-demo get svc qwen-demo -o yaml

重点看：

metadata:
  ownerReferences:
    - apiVersion: aiinfra.example.com/v1alpha1
      kind: VLLMService
      name: qwen-demo
      controller: true

如果存在这段信息，说明 Service 已经归 VLLMService 管理。可以进一步验证级联删除。执行：

kubectl -n ai-demo delete vllmservice qwen-demo

然后观察 Deployment 和 Service 是否被自动删除：

kubectl -n ai-demo get deploy
kubectl -n ai-demo get svc

正常情况下，由 VLLMService 管理的 Deployment 和 Service 都会被 Kubernetes 垃圾回收机制清理掉。如果删除 VLLMService 后 Service 没有删除，就要重点检查 SetControllerReference 是否执行成功，以及 Service 是否真的带有 ownerReferences。

十四、验证自愈能力

还可以手动修改 Service，验证 Operator 是否会把它改回期望状态。例如手动删除 Service：

kubectl -n ai-demo delete svc qwen-demo

然后再查看：

kubectl -n ai-demo get svc

如果 VLLMService 还存在，Operator 下一次 Reconcile 时应该会重新创建 Service。因为当前 controller 中已经配置了：

For(&aiinfrav1alpha1.VLLMService{}).
Owns(&appsv1.Deployment{}).
Owns(&corev1.Service{})

并且 Reconcile 中每次都会通过 CreateOrUpdate 保证 Service 存在，所以手动删除 Service 后，Operator 会重新把它创建出来。

十五、常见问题排查

如果执行完 VLLMService 后没有生成 Service，优先查看 Operator 日志：

kubectl -n vllmservice-operator-system logs deploy/vllmservice-operator-controller-manager

如果日志里出现 forbidden，一般是 RBAC 没有 services 权限，需要检查 config/rbac/role.yaml 中是否包含：

resources:
  - services
verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch

如果 Service 已经创建，但是访问不通，优先检查 selector 和 endpoints：

kubectl -n ai-demo describe svc qwen-demo
kubectl -n ai-demo get endpoints qwen-demo
kubectl -n ai-demo get pod --show-labels

如果 endpoints 为空，通常说明 Service selector 和 Pod labels 对不上。当前代码中 Service selector 使用的是：

app.kubernetes.io/name=vllmservice
app.kubernetes.io/instance=qwen-demo

所以 Pod labels 里也必须包含这两个标签。

如果 Service 有 endpoints，但是访问 /v1/models 失败，就继续检查 Pod 是否 Running、vLLM 是否监听 8000 端口、容器端口名称是否为 http、Service 的 targetPort 是否也指向 http。

十六、本文总结

做到这里，VLLMService Operator 已经不只是创建 Deployment 了，它开始具备完整服务编排的雏形。上一篇文章的能力是：

VLLMService
  -> Deployment
  -> Pod

这一篇扩展后变成：

VLLMService
  -> Deployment
  -> Pod
  -> Service

这一步非常关键，因为 Service 是后续接入流量入口的基础。没有 Service，HTTPRoute 就没有稳定的后端目标；有了 Service，后面就可以继续扩展：

VLLMService
  -> Deployment
  -> Pod
  -> Service
  -> Gateway
  -> HTTPRoute

也就是说，Service 是从“模型 Pod 能跑起来”走向“模型服务能被稳定访问”的关键一步。

本文给 VLLMService Operator 增加了 Service 自动创建能力，主要完成了以下内容：

1. 给 Operator 增加 services 的 RBAC 权限；
2. 在 Reconcile 中使用 CreateOrUpdate 创建或更新 Service；
3. 使用 ClusterIP 作为 Service 类型；
4. 使用稳定 selectorLabels 选择后端 Pod；
5. 使用完整 objectLabels 作为 Service 自身 labels；
6. 使用 targetPort: http 转发到容器命名端口；
7. 给 Service 设置 OwnerReference；
8. 在 SetupWithManager 中增加 Owns(&corev1.Service{})；
9. 将 ServiceName 写入 VLLMService status；
10. 通过 Service port-forward 验证 /v1/models 和 /v1/chat/completions。

当前 Operator 的能力已经从单纯管理模型 Pod，升级为同时管理 Deployment 和 Service。用户只需要提交一个 VLLMService，Operator 就能自动创建底层 Deployment 和 Service，让 vLLM 模型服务具备稳定的集群内访问入口。下一步就可以继续扩展 Gateway API 和 HTTPRoute，让模型服务从集群内访问进一步走向统一入口访问。

本人水平有限，欢迎各位大佬批评指正。

加入AMD AI开发者计划！

免费领 150 小时云算力，进群参与显卡、AI PC 幸运抽奖

更多推荐

大模型输出格式约束原理

Prompt 引导 → 后处理验证 → 约束解码 → API 原生结构化输出这个过程中，核心思想从生成后检查转变为生成中约束——从概率保证走向了确定保证。场景推荐方案简单格式要求Prompt 引导 + 后处理验证严格 Schema自部署模型vLLM + XGrammar（推荐）复杂 DSL/文法多平台兼容AI SDK + 适配层不再把模型当作文本生成器，而是把它当作受控的数据生成器。当模型的输出格