Java 程序员第 40 阶段09:从零搭建 Java 大模型完整项目,部署上线与运维监控
概述
本文介绍 Java 大模型项目的部署上线与运维监控体系,涵盖 Docker 容器化打包、Kubernetes 容器编排、ArgoCD GitOps 持续部署、日志收集与查询(ELK)、以及 SkyWalking 链路追踪等核心内容。

1. Docker 镜像打包与构建

1.1 多阶段构建 Dockerfile
# 第一阶段:构建
FROM maven:3.9-eclipse-temurin-17 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn clean package -DskipTests
# 第二阶段:运行
FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
COPY --from=builder /app/target/*.jar app.jar
COPY config/ ./config/
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "-Xms512m", "-Xmx2g", "app.jar"]
1.2 模型镜像构建
FROM nvidia/cuda:12.1-runtime-ubuntu22.04
WORKDIR /app
COPY model/ ./model/
COPY transformer.jar ./transformer.jar
ENV MODEL_PATH=/app/model
ENV JAVA_OPTS="-Xms4g -Xmx8g"
CMD ["java", "-jar", "transformer.jar"]
1.3 镜像构建与推送
# 构建镜像
docker build -t llm-java-app:v1.0 .
# 打标签
docker tag llm-java-app:v1.0 registry.example.com/llm-java-app:v1.0
# 推送到私有仓库
docker push registry.example.com/llm-java-app:v1.0
2. Kubernetes 容器编排(K8s)

2.1 Deployment 配置
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-java-app
labels:
app: llm-java-app
spec:
replicas: 3
selector:
matchLabels:
app: llm-java-app
template:
metadata:
labels:
app: llm-java-app
spec:
containers:
- name: app
image: registry.example.com/llm-java-app:v1.0
ports:
- containerPort: 8080
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
env:
- name: SPRING_PROFILES_ACTIVE
value: "prod"
2.2 Service 与 Ingress
apiVersion: v1
kind: Service
metadata:
name: llm-java-service
spec:
selector:
app: llm-java-app
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: llm-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: llm.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: llm-java-service
port:
number: 80
2.3 HPA 自动伸缩
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: llm-java-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: llm-java-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
3. ArgoCD GitOps 持续部署

3.1 ArgoCD 安装
# 安装 ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# 安装 ArgoCD CLI
brew install argocd
# 登录 ArgoCD
argocd login --name argocd-server --username admin --password <password>
3.2 Application 配置
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: llm-java-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/example/llm-k8s-manifests.git
targetRevision: main
path: production
destination:
server: https://kubernetes.default.svc
namespace: llm-prod
syncPolicy:
automated:
prune: true
selfHeal: true
3.3 镜像自动更新
使用 Image Updater 实现镜像 tag 自动更新:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: llm-java-app
spec:
source:
plugin:
name: argocd-image-updater
config:
- name: dockerhub
driver: docker
api_url: https://registry.example.com/
credentials: ext:git:https://github.com/example/k8s-secrets.git
default_version: latest
4. 日志收集与查询(ELK)
4.1 Filebeat 配置
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
data:
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: /var/log/containers/
output.logstash:
hosts: ["logstash:5044"]
4.2 Logstash 管道配置
input {
beats {
port => 5044
}
}
filter {
json {
source => "message"
}
date {
match => ["timestamp", "ISO8601"]
target => "@timestamp"
}
if [level] == "ERROR" {
mutate {
add_tag => ["error"]
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "llm-logs-%{+YYYY.MM.dd}"
}
}
4.3 Kibana 日志查询
# 错误日志查询
level:ERROR AND app_name:"llm-java"
# 响应时间超过阈值的请求
response_time:>1000 AND path:"/api/llm/*"
# 特定用户的操作日志
user_id:"user123" AND timestamp:[now-1h TO now]
5. SkyWalking 链路追踪
5.1 Java Agent 配置
# 下载 Agent
wget https://archive.apache.org/dist/skywalking/9.5.0/apache-skywalking-apm-9.5.0.tar.gz
tar -xzf apache-skywalking-apm-9.5.0.tar.gz
# JVM 启动参数
java -javaagent:/path/to/skywalking-agent/skywalking-agent.jar \
-Dskywalking.agent.service_name=llm-java-app \
-Dskywalking.collector.backend_service=oap:11800 \
-jar llm-app.jar
5.2 Kubernetes Agent 挂载
apiVersion: v1
kind: ConfigMap
metadata:
name: skywalking-agent
data:
agent.config: |
agent.service_name=${APP_NAME}
collector.backend_service=${SKYWALKING_OAP}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-java-app
spec:
template:
spec:
containers:
- name: app
image: llm-java-app:v1.0
volumeMounts:
- name: skywalking-agent
mountPath: /agent
env:
- name: JAVA_OPTS
value: "-javaagent:/agent/skywalking-agent.jar"
volumes:
- name: skywalking-agent
emptyDir: {}
5.3 自定义追踪注解
import org.apache.skywalking.apm.toolkit.trace.annotation.Trace;
@Service
public class LLMService {
@Trace
public String generateResponse(String prompt) {
// LLM 调用会被追踪
return llmClient.invoke(prompt);
}
@Trace
@Tag(key = "model", value = "${tag.model_name}")
public String callModel(String prompt, @SpanTag("model") String model) {
return llmClient.invoke(prompt, model);
}
}
总结
本文介绍了 Java 大模型项目的完整部署与运维体系:
|
组件 |
作用 |
关键工具 |
|
Docker |
容器化打包 |
Dockerfile, BuildKit |
|
Kubernetes |
容器编排 |
Deployment, Service, HPA |
|
ArgoCD |
GitOps 部署 |
Application, Image Updater |
|
ELK |
日志收集 |
Filebeat, Logstash, Kibana |
|
SkyWalking |
链路追踪 |
Java Agent, OAP, UI |
通过这套完整的 DevOps 体系,可以实现大模型应用的自动化部署、可观测性监控和持续迭代。
更多推荐



所有评论(0)