K8s Helm 部署实战

SunnyFan大约 8 分钟约 2504 字

K8s Helm 部署实战

简介

Kubernetes 和 Helm 的结合为企业应用的容器化部署提供了完整的解决方案。本文将从实际项目角度出发，详细讲解基于 Helm 的 Kubernetes 应用部署全流程，包括从零开始创建自定义 Chart、配置多环境 values、集成 CI/CD 流水线实现自动化部署，以及生产环境中的回滚策略和故障处理。通过完整的实战案例，帮助运维团队建立标准化、可复制的应用部署体系。

特点

完整的端到端部署流程，涵盖开发、测试、生产全链路
自定义 Chart 开发指南，适配企业级应用交付需求
CI/CD 深度集成，实现代码提交到生产发布的全自动化
完善的回滚和故障恢复机制，保障生产环境稳定性

完整部署流程

以下是一个完整的 Web 应用从开发到生产部署的全流程示例。

# 项目结构
# k8s-deployment/
# ├── Chart.yaml
# ├── values.yaml
# ├── values-dev.yaml
# ├── values-staging.yaml
# ├── values-production.yaml
# ├── templates/
# │   ├── deployment.yaml
# │   ├── service.yaml
# │   ├── ingress.yaml
# │   ├── configmap.yaml
# │   ├── secret.yaml
# │   ├── hpa.yaml
# │   ├── pdb.yaml
# │   ├── servicemonitor.yaml
# │   ├── _helpers.tpl
# │   └── NOTES.txt
# ├── charts/
# └── .helmignore

# Chart.yaml
apiVersion: v2
name: web-platform
description: 企业级 Web 应用部署 Chart
type: application
version: 2.0.0
appVersion: "3.5.0"
kubeVersion: ">=1.24.0"
dependencies:
  - name: postgresql
    version: "13.x.x"
    repository: "https://charts.bitnami.com/bitnami"
    condition: postgresql.enabled
    tags:
      - database
  - name: redis
    version: "18.x.x"
    repository: "https://charts.bitnami.com/bitnami"
    condition: redis.enabled
    tags:
      - cache

# templates/deployment.yaml - 完整部署配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "web-platform.fullname" . }}
  labels:
    {{- include "web-platform.labels" . | nindent 4 }}
  annotations:
    reloader.stakater.com/auto: "true"
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      {{- include "web-platform.selectorLabels" . | nindent 6 }}
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: {{ .Values.rollingUpdate.maxSurge }}
      maxUnavailable: {{ .Values.rollingUpdate.maxUnavailable }}
  template:
    metadata:
      labels:
        {{- include "web-platform.selectorLabels" . | nindent 8 }}
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
        prometheus.io/scrape: "true"
        prometheus.io/port: "{{ .Values.service.port }}"
    spec:
      serviceAccountName: {{ include "web-platform.serviceAccountName" . }}
      terminationGracePeriodSeconds: 60
      {{- if .Values.priorityClassName }}
      priorityClassName: {{ .Values.priorityClassName }}
      {{- end }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      initContainers:
        - name: wait-for-db
          image: busybox:1.36
          command: ['sh', '-c', 'until nc -z {{ .Release.Name }}-postgresql 5432; do echo waiting for db; sleep 2; done;']
        - name: db-migrate
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          command: ["npm", "run", "migrate"]
          envFrom:
            - configMapRef:
                name: {{ include "web-platform.fullname" . }}
            - secretRef:
                name: {{ include "web-platform.fullname" . }}
      containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: {{ .Values.service.port }}
              protocol: TCP
          envFrom:
            - configMapRef:
                name: {{ include "web-platform.fullname" . }}
            - secretRef:
                name: {{ include "web-platform.fullname" . }}
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 30
            periodSeconds: 15
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 3
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 30
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: uploads
              mountPath: /app/uploads
      volumes:
        - name: tmp
          emptyDir: {}
        - name: uploads
          {{- if .Values.persistence.enabled }}
          persistentVolumeClaim:
            claimName: {{ include "web-platform.fullname" . }}-uploads
          {{- else }}
          emptyDir: {}
          {{- end }}
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}

自定义 Chart 开发

# templates/configmap.yaml - 配置管理
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "web-platform.fullname" . }}
  labels:
    {{- include "web-platform.labels" . | nindent 4 }}
data:
  NODE_ENV: "{{ .Values.environment }}"
  APP_PORT: "{{ .Values.service.port }}"
  LOG_LEVEL: "{{ .Values.logLevel }}"
  DB_HOST: "{{ .Release.Name }}-postgresql"
  DB_PORT: "5432"
  DB_NAME: "{{ .Values.postgresql.auth.database }}"
  REDIS_HOST: "{{ .Release.Name }}-redis-master"
  REDIS_PORT: "6379"
  {{- range $key, $value := .Values.extraEnv }}
  {{ $key }}: "{{ $value }}"
  {{- end }}
---
# templates/secret.yaml - 敏感配置（实际使用建议配合 Sealed Secrets 或 Vault）
apiVersion: v1
kind: Secret
metadata:
  name: {{ include "web-platform.fullname" . }}
  labels:
    {{- include "web-platform.labels" . | nindent 4 }}
type: Opaque
data:
  DB_PASSWORD: {{ .Values.postgresql.auth.password | b64enc | quote }}
  REDIS_PASSWORD: {{ .Values.redis.auth.password | b64enc | quote }}
  JWT_SECRET: {{ .Values.jwtSecret | b64enc | quote }}
---
# templates/hpa.yaml - 自动扩缩容
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "web-platform.fullname" . }}
  labels:
    {{- include "web-platform.labels" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "web-platform.fullname" . }}
  minReplicas: {{ .Values.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.autoscaling.maxReplicas }}
  metrics:
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
    {{- end }}
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
{{- end }}

# templates/pdb.yaml - Pod 中断预算
{{- if .Values.podDisruptionBudget.enabled }}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: {{ include "web-platform.fullname" . }}
  labels:
    {{- include "web-platform.labels" . | nindent 4 }}
spec:
  selector:
    matchLabels:
      {{- include "web-platform.selectorLabels" . | nindent 6 }}
  {{- if .Values.podDisruptionBudget.minAvailable }}
  minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
  {{- end }}
  {{- if .Values.podDisruptionBudget.maxUnavailable }}
  maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
  {{- end }}
{{- end }}
---
# templates/servicemonitor.yaml - Prometheus 监控
{{- if .Values.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: {{ include "web-platform.fullname" . }}
  labels:
    {{- include "web-platform.labels" . | nindent 4 }}
    release: prometheus
spec:
  selector:
    matchLabels:
      {{- include "web-platform.selectorLabels" . | nindent 6 }}
  endpoints:
    - port: http
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s
{{- end }}

CI/CD 集成部署

# .github/workflows/deploy.yml - GitHub Actions 部署流水线
name: Deploy Application

on:
  push:
    branches: [main, develop]
  workflow_dispatch:
    inputs:
      environment:
        description: 'Deploy environment'
        required: true
        default: 'staging'
        type: choice
        options:
          - staging
          - production

env:
  REGISTRY: myregistry.com
  IMAGE_NAME: web-platform
  CHART_PATH: ./k8s-deployment

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}
    steps:
      - uses: actions/checkout@v4

      - name: Login to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ secrets.REGISTRY_USER }}
          password: ${{ secrets.REGISTRY_PASSWORD }}

      - name: Build and push image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4

      - name: Setup kubectl
        uses: azure/setup-kubectl@v3
        with:
          version: 'v1.28.0'

      - name: Setup Helm
        uses: azure/setup-helm@v3
        with:
          version: 'v3.13.0'

      - name: Configure kubeconfig
        run: |
          mkdir -p $HOME/.kube
          echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > $HOME/.kube/config

      - name: Deploy to staging
        run: |
          helm upgrade --install web-platform ./k8s-deployment \
            --namespace staging \
            --create-namespace \
            -f ./k8s-deployment/values-staging.yaml \
            --set image.tag=${{ github.sha }} \
            --wait --timeout 5m \
            --atomic

  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Setup Helm
        uses: azure/setup-helm@v3

      - name: Configure kubeconfig
        run: |
          mkdir -p $HOME/.kube
          echo "${{ secrets.KUBE_CONFIG_PRODUCTION }}" | base64 -d > $HOME/.kube/config

      - name: Dry run
        run: |
          helm upgrade --install web-platform ./k8s-deployment \
            --namespace production \
            -f ./k8s-deployment/values-production.yaml \
            --set image.tag=${{ github.sha }} \
            --dry-run

      - name: Deploy to production
        run: |
          helm upgrade --install web-platform ./k8s-deployment \
            --namespace production \
            -f ./k8s-deployment/values-production.yaml \
            --set image.tag=${{ github.sha }} \
            --wait --timeout 10m \
            --atomic

回滚策略

# Helm 回滚操作

# 查看部署历史
helm history web-platform -n production

# 输出示例：
# REVISION  UPDATED                   STATUS      CHART              APP VERSION  DESCRIPTION
# 1         Mon Jan 15 10:00:00 2024  superseded  web-platform-1.0.0 3.0.0        Install complete
# 2         Mon Jan 15 14:30:00 2024  superseded  web-platform-1.1.0 3.1.0        Upgrade complete
# 3         Mon Jan 15 18:00:00 2024  deployed    web-platform-2.0.0 3.5.0        Upgrade complete

# 回滚到上一个版本
helm rollback web-platform -n production

# 回滚到指定版本
helm rollback web-platform 2 -n production

# 回滚并等待完成
helm rollback web-platform 2 -n production --wait --timeout 300s

# 查看回滚后的状态
helm status web-platform -n production
kubectl rollout status deployment/web-platform -n production

# 自动化回滚脚本
#!/bin/bash
set -euo pipefail

RELEASE_NAME="web-platform"
NAMESPACE="production"
MAX_REVISION=$(helm history "$RELEASE_NAME" -n "$NAMESPACE" | tail -1 | awk '{print $1}')

echo "当前版本: $MAX_REVISION"

# 部署新版本
helm upgrade --install "$RELEASE_NAME" ./k8s-deployment \
  --namespace "$NAMESPACE" \
  -f ./k8s-deployment/values-production.yaml \
  --set image.tag="$1" \
  --timeout 10m \
  --wait

# 部署后健康检查
echo "执行部署后健康检查..."
HEALTH_CHECK_PASSED=true

for i in $(seq 1 10); do
  HTTP_CODE=$(kubectl exec -n "$NAMESPACE" deployment/"$RELEASE_NAME" -- \
    wget -qO- --spider http://localhost:8080/healthz 2>/dev/null && echo "200" || echo "000")

  if [ "$HTTP_CODE" != "200" ]; then
    echo "健康检查第 $i 次失败 (HTTP: $HTTP_CODE)"
    HEALTH_CHECK_PASSED=false
    sleep 10
  else
    echo "健康检查通过"
    HEALTH_CHECK_PASSED=true
    break
  fi
done

if [ "$HEALTH_CHECK_PASSED" = false ]; then
  echo "健康检查失败，执行自动回滚..."
  helm rollback "$RELEASE_NAME" "$((MAX_REVISION))" -n "$NAMESPACE" --wait
  echo "回滚完成"
  exit 1
fi

echo "部署成功完成"

优点

完整的端到端部署方案，从代码提交到生产发布全流程自动化
Chart 模板高度可复用，一套模板适配多环境多应用
内置版本管理和回滚机制，发布过程可控可逆
与 CI/CD 工具深度集成，实现 GitOps 工作流

缺点

Chart 模板开发学习成本较高，复杂的条件渲染增加维护难度
大规模部署时 Helm 状态管理可能出现性能瓶颈
多团队协作时 Chart 版本管理需要严格的规范和流程
调试部署问题时需要同时排查 Helm 模板和 Kubernetes 资源

总结

K8s Helm 部署实战展示了从自定义 Chart 开发到 CI/CD 集成部署的完整流程。通过模块化的 Chart 模板设计、分层的 values 配置管理和自动化流水线，运维团队能够建立标准化的应用部署体系。完善的回滚策略和健康检查机制保障了生产环境的稳定性，配合 ArgoCD 等 GitOps 工具更能实现全自动的持续交付。掌握这套部署体系，是构建现代化云原生应用交付能力的关键。