Kubernetes 进阶
大约 8 分钟约 2522 字
Kubernetes 进阶
简介
Kubernetes 进阶涵盖 Helm 包管理、Ingress 控制器、PersistentVolume 持久化、HPA 自动伸缩、ConfigMap/Secret 管理、RBAC 权限等核心功能。掌握这些能力是生产环境 K8s 运维的关键。
特点
Helm 包管理
安装与基本使用
# 安装 Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# 添加仓库
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
# 搜索
helm search repo nginx
# 安装
helm install my-nginx bitnami/nginx
# 查看
helm list
helm status my-nginx
# 升级
helm upgrade my-nginx bitnami/nginx --set replicaCount=3
# 卸载
helm uninstall my-nginx自定义 Chart
# 创建 Chart
helm create myapp# myapp/values.yaml — 配置值
replicaCount: 2
image:
repository: myregistry/myapp
pullPolicy: IfNotPresent
tag: "1.0.0"
service:
type: ClusterIP
port: 80
ingress:
enabled: true
className: nginx
hosts:
- host: api.example.com
paths:
- path: /
pathType: Prefix
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70# myapp/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "myapp.fullname" . }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ include "myapp.name" . }}
template:
metadata:
labels:
app: {{ include "myapp.name" . }}
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
ports:
- containerPort: 8080
resources:
{{- toYaml .Values.resources | nindent 12 }}
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
env:
- name: ASPNETCORE_ENVIRONMENT
valueFrom:
configMapKeyRef:
name: {{ include "myapp.fullname" . }}-config
key: environment
- name: ConnectionStrings__Default
valueFrom:
secretKeyRef:
name: {{ include "myapp.fullname" . }}-secret
key: connection-string# 部署
helm install myapp ./myapp -f myapp/values-prod.yaml
# 升级
helm upgrade myapp ./myapp --set image.tag=2.0.0
# 回滚
helm rollback myapp 1Ingress 控制器
Nginx Ingress
# 安装 Ingress 控制器
# kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml
# Ingress 规则
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-service
port:
number: 80持久化存储
PV 和 PVC
# PersistentVolume — 存储资源
apiVersion: v1
kind: PersistentVolume
metadata:
name: sqlserver-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
hostPath:
path: /data/sqlserver
---
# PersistentVolumeClaim — 存储申请
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sqlserver-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
---
# StatefulSet — 使用 PVC
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sqlserver
spec:
serviceName: sqlserver
replicas: 1
selector:
matchLabels:
app: sqlserver
template:
metadata:
labels:
app: sqlserver
spec:
containers:
- name: sqlserver
image: mcr.microsoft.com/mssql/server:2022-latest
env:
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
valueFrom:
secretKeyRef:
name: sql-secret
key: password
ports:
- containerPort: 1433
volumeMounts:
- name: data
mountPath: /var/opt/mssql
volumes:
- name: data
persistentVolumeClaim:
claimName: sqlserver-pvcHPA 自动伸缩
水平自动伸缩
# HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60# 查看自动伸缩状态
kubectl get hpa
kubectl describe hpa myapp-hpa
# 手动测试
kubectl load-testing ... # 压测触发伸缩ConfigMap 和 Secret
配置管理
# ConfigMap — 非敏感配置
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
data:
environment: "Production"
logging__level: "Information"
cors__origins: "https://example.com,https://app.example.com"
appsettings.json: |
{
"Logging": {
"LogLevel": {
"Default": "Information"
}
},
"AllowedHosts": "*"
}
---
# Secret — 敏感信息
apiVersion: v1
kind: Secret
metadata:
name: myapp-secret
type: Opaque
data:
connection-string: <base64 encoded>
jwt-secret: <base64 encoded>
stringData:
# 或者直接写明文(会自动 base64)
api-key: "my-secret-api-key"# 在 Pod 中使用
spec:
containers:
- name: myapp
envFrom:
- configMapRef:
name: myapp-config
- secretRef:
name: myapp-secret
volumeMounts:
- name: config-volume
mountPath: /app/appsettings.json
subPath: appsettings.json
volumes:
- name: config-volume
configMap:
name: myapp-configRBAC 权限控制
基于角色的访问控制
# Role — 命名空间级权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: developer-role
rules:
- apiGroups: ["", "apps"]
resources: ["pods", "deployments", "services"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["patch"] # 仅允许重启(重启部署)
---
# RoleBinding — 绑定用户到 Role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: developer-binding
namespace: production
subjects:
- kind: User
name: dev-user
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: developer-role
apiGroup: rbac.authorization.k8s.io
---
# ClusterRole — 集群级权限(只读)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-reader
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["*"]
verbs: ["get"]ServiceAccount 与 Pod 安全
# 创建专用 ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: myapp-sa
namespace: production
automountServiceAccountToken: false # 不自动挂载 token
---
# Pod 使用 ServiceAccount
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
serviceAccountName: myapp-sa
automountServiceAccountToken: false
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: myapp
image: myapp:1.0
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]网络策略
NetworkPolicy 流量控制
# 默认拒绝所有入站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-ingress
namespace: production
spec:
podSelector: {} # 匹配所有 Pod
policyTypes:
- Ingress
---
# 允许特定流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-api
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
---
# 允许来自特定命名空间的流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-monitoring
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 9090高级调度与亲和性
Node 亲和性与反亲和性
# 节点亲和性 — 将 Pod 调度到特定节点
apiVersion: v1
kind: Pod
metadata:
name: gpu-workload
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/gpu
operator: Exists
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a"]
# Pod 反亲和性 — 分散部署
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: myapp
topologyKey: kubernetes.io/hostname
containers:
- name: app
image: myapp:1.0污点与容忍
# 给节点添加污点(专用节点)
kubectl taint nodes node-1 dedicated=gpu:NoSchedule
kubectl taint nodes node-2 dedicated=monitoring:NoExecute
# 查看节点污点
kubectl describe node node-1 | grep Taints
# 移除污点
kubectl taint nodes node-1 dedicated=gpu:NoSchedule-# Pod 配置容忍度
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
- key: "dedicated"
operator: "Equal"
value: "monitoring"
effect: "NoExecute"
tolerationSeconds: 3600 # 最多容忍 1 小时故障排查与排错
Pod 排错流程
# 1. 查看 Pod 状态
kubectl get pods -n production
kubectl get pods -n production -o wide
# 2. 查看 Pod 事件
kubectl describe pod myapp-xxx -n production | grep -A20 Events
# 3. 查看容器日志
kubectl logs myapp-xxx -n production
kubectl logs myapp-xxx -n production --previous # 上一个容器(CrashLoop)
kubectl logs myapp-xxx -n production -c sidecar # 指定容器
# 4. 进入容器调试
kubectl exec -it myapp-xxx -n production -- /bin/sh
kubectl exec -it myapp-xxx -n production -- /bin/bash
# 5. 临时调试容器(K8s 1.25+)
kubectl debug myapp-xxx -n production -it --image=busybox
# 6. 查看资源使用
kubectl top pod myapp-xxx -n production
kubectl top nodes常见故障排查表
| 状态 | 原因 | 排查命令 |
|----------------------|--------------------------|-----------------------------------|
| ImagePullBackOff | 镜像拉取失败 | kubectl describe pod |
| CrashLoopBackOff | 容器启动后崩溃 | kubectl logs --previous |
| Pending | 资源不足/调度失败 | kubectl describe pod |
| OOMKilled | 内存超限 | kubectl describe pod |
| ContainerCreating | 挂载/配置问题 | kubectl describe pod |
| Evicted | 节点资源压力 | kubectl get events |
| Lost | 节点失联 | kubectl get nodes |常用运维命令
# 资源查看
kubectl get all -n myapp
kubectl get pods -o wide
kubectl get events --sort-by=.metadata.creationTimestamp
# Pod 调试
kubectl logs -f myapp-xxx -c myapp # 实时日志
kubectl exec -it myapp-xxx -- bash # 进入容器
kubectl describe pod myapp-xxx # Pod 详情
kubectl port-forward myapp-xxx 8080:80 # 端口转发
# 节点管理
kubectl get nodes
kubectl describe node node-1
kubectl cordon node-1 # 标记不可调度
kubectl drain node-1 --ignore-daemonsets # 驱逐 Pod
kubectl uncordon node-1 # 恢复调度
# 资源配额
kubectl top pods
kubectl top nodes
kubectl get resourcequota -A优点
缺点
总结
Kubernetes 进阶核心是掌握 Helm、Ingress、PV/PVC 和 HPA 四大模块。Helm 管理部署模板,Ingress 统一流量入口,PV/PVC 管理持久数据,HPA 实现自动伸缩。生产环境务必配置资源限制、健康检查和自动伸缩。
关键知识点
- 部署类主题的核心不是“装成功”,而是“稳定运行、可排障、可回滚”。
- 同一个服务通常至少要关注版本、目录、端口、权限、数据、日志和备份。
- Linux 问题经常跨越系统层、网络层、服务层和应用层。
- Kubernetes 主题必须同时看资源对象、调度行为、网络暴露和配置分发。
项目落地视角
- 把安装步骤补成可重复执行的清单,必要时写成脚本或配置文件。
- 把配置目录、数据目录、日志目录和挂载点明确拆开。
- 上线前检查防火墙、SELinux、时区、磁盘、系统服务和健康检查。
- 上线前检查镜像、命名空间、探针、资源限制、Service/Ingress 和配置来源。
常见误区
- 使用 latest 或未固定版本,导致环境不可复现。
- 只验证启动成功,不验证持久化、开机自启和故障恢复。
- 遇到问题先改配置而不是先看日志和依赖链路。
- 只会 apply YAML,不理解对象之间的依赖关系。
进阶路线
- 继续补齐 systemd、性能监控、安全加固和备份恢复。
- 把单机操作升级成 Docker、Kubernetes 或 IaC 方案。
- 建立标准化运维手册,包括巡检、扩容、回滚和灾备演练。
- 继续补齐调度、网络策略、存储、GitOps 和平台工程能力。
适用场景
- 当你准备把《Kubernetes 进阶》真正落到项目里时,最适合先在一个独立模块或最小样例里验证关键路径。
- 适合单机环境初始化、中间件快速搭建、测试环境验证和生产部署前准备。
- 当服务稳定性依赖端口、权限、目录、网络和系统参数时,这类主题会直接影响成败。
落地建议
- 固定版本号与镜像标签,避免“latest”带来的不可预期变化。
- 把配置、数据、日志目录拆开管理,并记录恢复步骤。
- 上线前确认端口、防火墙、SELinux、时区和磁盘空间。
排错清单
- 先查 systemctl、容器日志和应用日志,确认失败发生在哪一层。
- 检查端口占用、目录权限、挂载路径和网络连通性。
- 如果是新环境问题,优先对比与已知正常环境的差异。
复盘问题
- 如果把《Kubernetes 进阶》放进你的当前项目,最先要验证的输入、输出和失败路径分别是什么?
- 《Kubernetes 进阶》最容易在什么规模、什么边界条件下暴露问题?你会用什么指标或日志去确认?
- 相比默认实现或替代方案,采用《Kubernetes 进阶》最大的收益和代价分别是什么?
