K8s Helm 部署实战
大约 8 分钟约 2504 字
K8s Helm 部署实战
简介
Kubernetes 和 Helm 的结合为企业应用的容器化部署提供了完整的解决方案。本文将从实际项目角度出发,详细讲解基于 Helm 的 Kubernetes 应用部署全流程,包括从零开始创建自定义 Chart、配置多环境 values、集成 CI/CD 流水线实现自动化部署,以及生产环境中的回滚策略和故障处理。通过完整的实战案例,帮助运维团队建立标准化、可复制的应用部署体系。
特点
完整部署流程
以下是一个完整的 Web 应用从开发到生产部署的全流程示例。
# 项目结构
# k8s-deployment/
# ├── Chart.yaml
# ├── values.yaml
# ├── values-dev.yaml
# ├── values-staging.yaml
# ├── values-production.yaml
# ├── templates/
# │ ├── deployment.yaml
# │ ├── service.yaml
# │ ├── ingress.yaml
# │ ├── configmap.yaml
# │ ├── secret.yaml
# │ ├── hpa.yaml
# │ ├── pdb.yaml
# │ ├── servicemonitor.yaml
# │ ├── _helpers.tpl
# │ └── NOTES.txt
# ├── charts/
# └── .helmignore
# Chart.yaml
apiVersion: v2
name: web-platform
description: 企业级 Web 应用部署 Chart
type: application
version: 2.0.0
appVersion: "3.5.0"
kubeVersion: ">=1.24.0"
dependencies:
- name: postgresql
version: "13.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: postgresql.enabled
tags:
- database
- name: redis
version: "18.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
tags:
- cache# templates/deployment.yaml - 完整部署配置
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "web-platform.fullname" . }}
labels:
{{- include "web-platform.labels" . | nindent 4 }}
annotations:
reloader.stakater.com/auto: "true"
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "web-platform.selectorLabels" . | nindent 6 }}
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: {{ .Values.rollingUpdate.maxSurge }}
maxUnavailable: {{ .Values.rollingUpdate.maxUnavailable }}
template:
metadata:
labels:
{{- include "web-platform.selectorLabels" . | nindent 8 }}
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
prometheus.io/scrape: "true"
prometheus.io/port: "{{ .Values.service.port }}"
spec:
serviceAccountName: {{ include "web-platform.serviceAccountName" . }}
terminationGracePeriodSeconds: 60
{{- if .Values.priorityClassName }}
priorityClassName: {{ .Values.priorityClassName }}
{{- end }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
initContainers:
- name: wait-for-db
image: busybox:1.36
command: ['sh', '-c', 'until nc -z {{ .Release.Name }}-postgresql 5432; do echo waiting for db; sleep 2; done;']
- name: db-migrate
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
command: ["npm", "run", "migrate"]
envFrom:
- configMapRef:
name: {{ include "web-platform.fullname" . }}
- secretRef:
name: {{ include "web-platform.fullname" . }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: {{ .Values.service.port }}
protocol: TCP
envFrom:
- configMapRef:
name: {{ include "web-platform.fullname" . }}
- secretRef:
name: {{ include "web-platform.fullname" . }}
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumeMounts:
- name: tmp
mountPath: /tmp
- name: uploads
mountPath: /app/uploads
volumes:
- name: tmp
emptyDir: {}
- name: uploads
{{- if .Values.persistence.enabled }}
persistentVolumeClaim:
claimName: {{ include "web-platform.fullname" . }}-uploads
{{- else }}
emptyDir: {}
{{- end }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}自定义 Chart 开发
# templates/configmap.yaml - 配置管理
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "web-platform.fullname" . }}
labels:
{{- include "web-platform.labels" . | nindent 4 }}
data:
NODE_ENV: "{{ .Values.environment }}"
APP_PORT: "{{ .Values.service.port }}"
LOG_LEVEL: "{{ .Values.logLevel }}"
DB_HOST: "{{ .Release.Name }}-postgresql"
DB_PORT: "5432"
DB_NAME: "{{ .Values.postgresql.auth.database }}"
REDIS_HOST: "{{ .Release.Name }}-redis-master"
REDIS_PORT: "6379"
{{- range $key, $value := .Values.extraEnv }}
{{ $key }}: "{{ $value }}"
{{- end }}
---
# templates/secret.yaml - 敏感配置(实际使用建议配合 Sealed Secrets 或 Vault)
apiVersion: v1
kind: Secret
metadata:
name: {{ include "web-platform.fullname" . }}
labels:
{{- include "web-platform.labels" . | nindent 4 }}
type: Opaque
data:
DB_PASSWORD: {{ .Values.postgresql.auth.password | b64enc | quote }}
REDIS_PASSWORD: {{ .Values.redis.auth.password | b64enc | quote }}
JWT_SECRET: {{ .Values.jwtSecret | b64enc | quote }}
---
# templates/hpa.yaml - 自动扩缩容
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: {{ include "web-platform.fullname" . }}
labels:
{{- include "web-platform.labels" . | nindent 4 }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "web-platform.fullname" . }}
minReplicas: {{ .Values.autoscaling.minReplicas }}
maxReplicas: {{ .Values.autoscaling.maxReplicas }}
metrics:
{{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
{{- end }}
{{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
{{- end }}
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
{{- end }}# templates/pdb.yaml - Pod 中断预算
{{- if .Values.podDisruptionBudget.enabled }}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: {{ include "web-platform.fullname" . }}
labels:
{{- include "web-platform.labels" . | nindent 4 }}
spec:
selector:
matchLabels:
{{- include "web-platform.selectorLabels" . | nindent 6 }}
{{- if .Values.podDisruptionBudget.minAvailable }}
minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
{{- end }}
{{- if .Values.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
{{- end }}
{{- end }}
---
# templates/servicemonitor.yaml - Prometheus 监控
{{- if .Values.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ include "web-platform.fullname" . }}
labels:
{{- include "web-platform.labels" . | nindent 4 }}
release: prometheus
spec:
selector:
matchLabels:
{{- include "web-platform.selectorLabels" . | nindent 6 }}
endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10s
{{- end }}CI/CD 集成部署
# .github/workflows/deploy.yml - GitHub Actions 部署流水线
name: Deploy Application
on:
push:
branches: [main, develop]
workflow_dispatch:
inputs:
environment:
description: 'Deploy environment'
required: true
default: 'staging'
type: choice
options:
- staging
- production
env:
REGISTRY: myregistry.com
IMAGE_NAME: web-platform
CHART_PATH: ./k8s-deployment
jobs:
build:
runs-on: ubuntu-latest
outputs:
image_tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v4
- name: Login to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ secrets.REGISTRY_USER }}
password: ${{ secrets.REGISTRY_PASSWORD }}
- name: Build and push image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
deploy-staging:
needs: build
if: github.ref == 'refs/heads/develop'
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- name: Setup kubectl
uses: azure/setup-kubectl@v3
with:
version: 'v1.28.0'
- name: Setup Helm
uses: azure/setup-helm@v3
with:
version: 'v3.13.0'
- name: Configure kubeconfig
run: |
mkdir -p $HOME/.kube
echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > $HOME/.kube/config
- name: Deploy to staging
run: |
helm upgrade --install web-platform ./k8s-deployment \
--namespace staging \
--create-namespace \
-f ./k8s-deployment/values-staging.yaml \
--set image.tag=${{ github.sha }} \
--wait --timeout 5m \
--atomic
deploy-production:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Setup Helm
uses: azure/setup-helm@v3
- name: Configure kubeconfig
run: |
mkdir -p $HOME/.kube
echo "${{ secrets.KUBE_CONFIG_PRODUCTION }}" | base64 -d > $HOME/.kube/config
- name: Dry run
run: |
helm upgrade --install web-platform ./k8s-deployment \
--namespace production \
-f ./k8s-deployment/values-production.yaml \
--set image.tag=${{ github.sha }} \
--dry-run
- name: Deploy to production
run: |
helm upgrade --install web-platform ./k8s-deployment \
--namespace production \
-f ./k8s-deployment/values-production.yaml \
--set image.tag=${{ github.sha }} \
--wait --timeout 10m \
--atomic回滚策略
# Helm 回滚操作
# 查看部署历史
helm history web-platform -n production
# 输出示例:
# REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
# 1 Mon Jan 15 10:00:00 2024 superseded web-platform-1.0.0 3.0.0 Install complete
# 2 Mon Jan 15 14:30:00 2024 superseded web-platform-1.1.0 3.1.0 Upgrade complete
# 3 Mon Jan 15 18:00:00 2024 deployed web-platform-2.0.0 3.5.0 Upgrade complete
# 回滚到上一个版本
helm rollback web-platform -n production
# 回滚到指定版本
helm rollback web-platform 2 -n production
# 回滚并等待完成
helm rollback web-platform 2 -n production --wait --timeout 300s
# 查看回滚后的状态
helm status web-platform -n production
kubectl rollout status deployment/web-platform -n production# 自动化回滚脚本
#!/bin/bash
set -euo pipefail
RELEASE_NAME="web-platform"
NAMESPACE="production"
MAX_REVISION=$(helm history "$RELEASE_NAME" -n "$NAMESPACE" | tail -1 | awk '{print $1}')
echo "当前版本: $MAX_REVISION"
# 部署新版本
helm upgrade --install "$RELEASE_NAME" ./k8s-deployment \
--namespace "$NAMESPACE" \
-f ./k8s-deployment/values-production.yaml \
--set image.tag="$1" \
--timeout 10m \
--wait
# 部署后健康检查
echo "执行部署后健康检查..."
HEALTH_CHECK_PASSED=true
for i in $(seq 1 10); do
HTTP_CODE=$(kubectl exec -n "$NAMESPACE" deployment/"$RELEASE_NAME" -- \
wget -qO- --spider http://localhost:8080/healthz 2>/dev/null && echo "200" || echo "000")
if [ "$HTTP_CODE" != "200" ]; then
echo "健康检查第 $i 次失败 (HTTP: $HTTP_CODE)"
HEALTH_CHECK_PASSED=false
sleep 10
else
echo "健康检查通过"
HEALTH_CHECK_PASSED=true
break
fi
done
if [ "$HEALTH_CHECK_PASSED" = false ]; then
echo "健康检查失败,执行自动回滚..."
helm rollback "$RELEASE_NAME" "$((MAX_REVISION))" -n "$NAMESPACE" --wait
echo "回滚完成"
exit 1
fi
echo "部署成功完成"优点
缺点
总结
K8s Helm 部署实战展示了从自定义 Chart 开发到 CI/CD 集成部署的完整流程。通过模块化的 Chart 模板设计、分层的 values 配置管理和自动化流水线,运维团队能够建立标准化的应用部署体系。完善的回滚策略和健康检查机制保障了生产环境的稳定性,配合 ArgoCD 等 GitOps 工具更能实现全自动的持续交付。掌握这套部署体系,是构建现代化云原生应用交付能力的关键。
关键知识点
- DevOps 主题的核心是让交付更快、更稳、更可审计。
- 自动化不是把命令脚本化,而是把失败、回滚、权限和观测一起设计进去。
- 生产链路必须明确制品、环境、凭据、配置和责任边界。
- Kubernetes 主题必须同时看资源对象、调度行为、网络暴露和配置分发。
项目落地视角
- 把流水线拆成构建、测试、制品、部署、验证和回滚几个阶段。
- 为关键步骤补齐日志、指标、通知和人工兜底点。
- 定期演练扩容、回滚、故障注入和灾备切换。
- 上线前检查镜像、命名空间、探针、资源限制、Service/Ingress 和配置来源。
常见误区
- 只关注部署成功,不关注失败恢复和审计追踪。
- 把环境差异藏在临时脚本或人工操作里。
- 上线频率高了以后,没有标准化制品和配置管理。
- 只会 apply YAML,不理解对象之间的依赖关系。
进阶路线
- 继续补齐 GitOps、可观测性、平台工程和成本治理。
- 把主题和应用架构、安全、权限、备份恢复联动起来理解。
- 形成团队级平台能力,而不是每个项目重复造轮子。
- 继续补齐调度、网络策略、存储、GitOps 和平台工程能力。
适用场景
- 当你准备把《K8s Helm 部署实战》真正落到项目里时,最适合先在一个独立模块或最小样例里验证关键路径。
- 适合构建自动化交付、基础设施治理、监控告警和生产发布体系。
- 当团队规模扩大、发布频率提升或环境变多时,这类主题会显著影响交付效率。
落地建议
- 所有自动化流程尽量做到幂等、可审计、可回滚。
- 把制品、变量、凭据和执行权限分层管理。
- 定期演练扩容、回滚、密钥轮换和灾备恢复。
排错清单
- 先定位失败发生在代码、构建、制品、环境还是权限层。
- 检查流水线变量、凭据、镜像标签和目标环境配置是否一致。
- 如果问题偶发,重点看并发发布、资源争抢和外部依赖抖动。
复盘问题
- 如果把《K8s Helm 部署实战》放进你的当前项目,最先要验证的输入、输出和失败路径分别是什么?
- 《K8s Helm 部署实战》最容易在什么规模、什么边界条件下暴露问题?你会用什么指标或日志去确认?
- 相比默认实现或替代方案,采用《K8s Helm 部署实战》最大的收益和代价分别是什么?
