Istio 服务网格

SunnyFan大约 14 分钟约 4339 字

Istio 服务网格

简介

Istio 是 Kubernetes 上最广泛使用的 Service Mesh 实现，通过 Sidecar 代理（Envoy）为微服务提供流量管理、安全通信、可观测性三大能力，而无需修改应用代码。它将网络层的关注点从业务代码中剥离，是大规模微服务治理的关键基础设施。

Istio 的核心设计理念是"基础设施层抽象"——将原本散落在各个微服务中的网络治理逻辑（负载均衡、熔断、重试、认证、监控等）统一收敛到 Sidecar 代理中，由控制面集中管理。这意味着业务开发人员只需关注业务逻辑，而运维团队则通过 Istio 的 CRD（Custom Resource Definition）统一管理流量策略、安全策略和可观测性配置。

在微服务架构演进过程中，随着服务数量增长，传统的服务治理方式面临以下挑战：服务间调用关系复杂难以追踪、安全策略难以统一实施、流量管理（灰度发布、故障注入）需要侵入业务代码。Istio 通过数据面（Envoy Sidecar）和控制面（istiod）的分层架构，系统性地解决了这些问题。

特点

1.流量管理 — 支持 Canary 发布、A/B 测试、流量镜像、超时重试、故障注入等精细流量控制
2.mTLS 安全通信 — 自动为服务间通信提供双向 TLS 加密和身份认证
3.可观测性 — 自动采集分布式追踪（Zipkin/Jaeger）、指标（Prometheus）和访问日志
4.Sidecar 注入 — 通过 Mutating Webhook 自动为 Pod 注入 Envoy 代理，对应用透明
5.Policy Enforcement — 支持速率限制、黑白名单、JWT 验证等服务级别的安全策略
6.多集群支持 — 跨多个 Kubernetes 集群统一管理服务网格
7.网关管理 — 通过 Istio Gateway 替代传统的 Ingress Controller

Istio 架构详解

控制面（Control Plane）

Istio 的控制面由 istiod 统一提供，它整合了以下组件的功能：

Pilot：服务发现和流量管理，将路由规则转换为 Envoy 配置
Citadel：证书管理，自动签发和轮换 mTLS 证书
Galley：配置验证和分发

# istiod 是 Istio 的核心控制面组件
# 查看 istiod 的运行状态
kubectl get pods -n istio-system
kubectl logs -n istio-system -l app=istiod

# 查看 istiod 的配置
kubectl get configmap istio -n istio-system -o yaml

数据面（Data Plane）

数据面由部署在每个 Pod 中的 Envoy Sidecar 代理组成：

Inbound 流量：入站流量经过 Sidecar 后转发给应用容器
Outbound 流量：应用容器发出的流量经过 Sidecar 后转发到目标服务

# 查看某个 Pod 的 Sidecar 代理配置
kubectl exec -n production web-api-xxx -c istio-proxy -- pilot-agent request GET config_dump

# 查看 Sidecar 代理的监听器
kubectl exec -n production web-api-xxx -c istio-proxy -- curl -s localhost:15000/listeners | python3 -m json.tool

# 查看 Sidecar 代理的集群（上游服务）配置
kubectl exec -n production web-api-xxx -c istio-proxy -- curl -s localhost:15000/clusters | head -50

# 查看 Sidecar 代理的路由表
kubectl exec -n production web-api-xxx -c istio-proxy -- curl -s localhost:15000/routes | head -50

Sidecar 注入原理

# Istio 的 Sidecar 注入通过 Mutating Admission Webhook 实现
# 当 Pod 创建请求到达 API Server 时，Webhook 拦截请求并修改 Pod Spec

# 查看自动注入的 Webhook 配置
kubectl get mutatingwebhookconfigurations istio-sidecar-injector -o yaml

# 手动注入 Sidecar（不依赖 Webhook）
istioctl kube-inject -f deployment.yaml | kubectl apply -f -

# Pod 级别的注入控制
apiVersion: v1
kind: Pod
metadata:
  name: my-app
  annotations:
    sidecar.istio.io/inject: "true"    # 强制注入
    # sidecar.istio.io/inject: "false" # 强制不注入
spec:
  containers:
    - name: app
      image: my-app:latest

实现

Istio 安装与 Sidecar 自动注入

# 下载 istioctl
curl -L https://istio.io/downloadIstio | sh -
cd istio-*
export PATH=$PWD/bin:$PATH

# 查看支持的安装配置
istioctl profile list

# 使用 demo 配置安装（适合学习和测试）
istioctl install --set profile=demo -y

# 生产环境推荐配置
istioctl install --set profile=production \
  --set values.global.proxy.autoInject="disabled" \
  --set values.global.proxy.resources.requests.cpu="100m" \
  --set values.global.proxy.resources.requests.memory="128Mi" \
  --set values.global.proxy.resources.limits.cpu="500m" \
  --set values.global.proxy.resources.limits.memory="512Mi" \
  -y

# 最小化安装（只安装核心组件）
istioctl install --set profile=minimal -y

# 验证安装
istioctl verify-install
kubectl get pods -n istio-system

# 为命名空间启用 Sidecar 自动注入
kubectl label namespace production istio-injection=enabled

# 验证标签
kubectl get namespace production --show-labels

# 部署应用后自动注入 Envoy Sidecar
kubectl apply -f deployment.yaml -n production

# 验证 Sidecar 注入成功
kubectl get pods -n production -o jsonpath='{.items[*].spec.containers[*].name}'
# 应看到应用容器名 + istio-proxy

# 查看注入后的 Pod 详情
kubectl describe pod -n production -l app=web-api

自定义安装配置

# istio-operator 自定义安装
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: istio-controlplane
spec:
  profile: production
  values:
    global:
      proxy:
        autoInject: disabled
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
      logging:
        level: "warning"          # Sidecar 日志级别
      meshID: production-mesh     # 网格标识
      multiCluster:
        clusterName: cluster-1    # 集群名称
      network: network-v1         # 网络标识
  components:
    pilot:
      enabled: true
      k8s:
        resources:
          requests:
            cpu: 500m
            memory: 2Gi
          limits:
            cpu: 1000m
            memory: 4Gi
        hpaSpec:
          minReplicas: 2
          maxReplicas: 5

# 使用自定义配置安装
istioctl install -f istio-operator.yaml -y

VirtualService 实现 Canary 发布

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: web-api
  namespace: production
spec:
  hosts:
    - web-api
  http:
    # 基于请求头的 Canary 路由
    - match:
        - headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: web-api
            subset: v2
          weight: 100
    # 默认流量分配（90% v1，10% v2）
    - route:
        - destination:
            host: web-api
            subset: v1
          weight: 90
        - destination:
            host: web-api
            subset: v2
          weight: 10
    # 超时与重试配置
    - route:
        - destination:
            host: web-api
            subset: v1
      timeout: 10s
      retries:
        attempts: 3
        perTryTimeout: 3s
        retryOn: 5xx,reset,connect-failure,refused-stream
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: web-api
  namespace: production
spec:
  host: web-api
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
        connectTimeout: 5s
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 100
        http2MaxRequests: 100
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 60s
      maxEjectionPercent: 50
    # 负载均衡策略
    # ROUND_ROBIN, LEAST_CONN, RANDOM, PASSTHROUGH
    loadBalancer:
      simple: ROUND_ROBIN
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

基于 URI 路径的路由

# 按 URL 路径路由到不同版本
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-gateway
  namespace: production
spec:
  hosts:
    - api.example.com
  gateways:
    - istio-system/main-gateway
  http:
    # /api/v2/* 路径路由到 v2
    - match:
        - uri:
            prefix: /api/v2/
      rewrite:
        uri: /
      route:
        - destination:
            host: api-server
            subset: v2
    # /api/v1/* 路径路由到 v1
    - match:
        - uri:
            prefix: /api/v1/
      rewrite:
        uri: /
      route:
        - destination:
            host: api-server
            subset: v1
    # 其他路径默认路由到 v1
    - route:
        - destination:
            host: api-server
            subset: v1

A/B 测试配置

# 基于 Cookie 的 A/B 测试
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: web-ab-test
  namespace: production
spec:
  hosts:
    - web-frontend
  http:
    - match:
        - headers:
            cookie:
              regex: "^(.*?;)?(experiment=blue)(;.*)?$"
      route:
        - destination:
            host: web-frontend
            subset: blue
    - match:
        - headers:
            cookie:
              regex: "^(.*?;)?(experiment=green)(;.*)?$"
      route:
        - destination:
            host: web-frontend
            subset: green
    - route:
        - destination:
            host: web-frontend
            subset: blue
          weight: 70
        - destination:
            host: web-frontend
            subset: green
          weight: 30

速率限制与熔断

# 全局速率限制（本地速率限制，不依赖外部 Redis）
apiVersion: networking.istio.io/v1beta1
kind: EnvoyFilter
metadata:
  name: rate-limit
  namespace: production
spec:
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: SIDECAR_INBOUND
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.local_ratelimit
          typed_config:
            "@type": type.googleapis.com/udpa.type.v1.TypedStruct
            type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
            value:
              stat_prefix: http_local_rate_limiter
              token_bucket:
                max_tokens: 100
                tokens_per_fill: 100
                fill_interval: 60s
              filter_enabled:
                default_value:
                  numerator: 100
                  denominator: HUNDRED
              filter_enforced:
                default_value:
                  numerator: 100
                  denominator: HUNDRED
---
# DestinationRule 熔断配置
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: api-circuit-breaker
  namespace: production
spec:
  host: api-server
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1000
        connectTimeout: 5s
      http:
        http1MaxPendingRequests: 1024
        http2MaxRequests: 1024
        maxRequestsPerConnection: 100
        maxRetries: 3
    outlierDetection:
      consecutive5xxErrors: 5        # 连续 5 个 5xx 错误触发熔断
      interval: 30s                  # 检测间隔
      baseEjectionTime: 60s          # 初始驱逐时间
      maxEjectionPercent: 50         # 最多驱逐 50% 的实例
      consecutiveGatewayFailure: 3   # 连续 3 个网关错误触发
      minHealthPercent: 50           # 健康实例最少保留 50%

基于外部 Redis 的全局速率限制

# 全局速率限制（需要 Redis 支持）
# 1. 部署 RateLimit Service
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: istio-system
spec:
  ports:
    - port: 6379
      targetPort: 6379
  selector:
    app: redis
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: istio-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:7-alpine
          ports:
            - containerPort: 6379

# EnvoyFilter 配置全局速率限制
apiVersion: networking.istio.io/v1beta1
kind: EnvoyFilter
metadata:
  name: global-rate-limit
  namespace: production
spec:
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: GATEWAY
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.rate_limit_quota
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.rate_limit_quota.v3.RateLimitQuotaFilterConfig

故障注入与流量镜像

# 故障注入：测试服务降级能力
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-fault-injection
  namespace: production
spec:
  hosts:
    - api-server
  http:
    - fault:
        # 注入延迟：50% 的请求延迟 3 秒
        delay:
          percentage:
            value: 50
          fixedDelay: 3s
        # 注入中断：10% 的请求返回 500
        abort:
          percentage:
            value: 10
          httpStatus: 500
      route:
        - destination:
            host: api-server
    # 可以限定故障注入的 URL 路径
    - fault:
        delay:
          percentage:
            value: 100
          fixedDelay: 5s
      match:
        - uri:
            exact: /api/v1/slow-endpoint
      route:
        - destination:
            host: api-server
---
# 流量镜像：将生产流量复制到测试环境验证
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-mirror
  namespace: production
spec:
  hosts:
    - api-server
  http:
    - route:
        - destination:
            host: api-server
      mirror:
        host: api-server-test
      mirrorPercentage:
        value: 10        # 镜像 10% 的流量
      # 注意：镜像流量会被忽略响应，不影响原始请求

# 验证流量镜像效果
# 在测试环境的 Pod 中查看日志
kubectl logs -n staging -l app=api-server-test -f

# 使用 istioctl 观察流量
istioctl analyze -n production

# 查看路由规则
istioctl x n route -n production

mTLS 安全通信

# 启用 Strict mTLS（所有服务间通信必须加密）
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT
---
# 对特定工作负载使用 Permissive 模式（过渡期）
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: legacy-app
  namespace: production
spec:
  selector:
    matchLabels:
      app: legacy-app
  mtls:
    mode: PERMISSIVE   # 同时允许明文和 mTLS
---
# AuthorizationPolicy — 服务间访问控制
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: api-authorization
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  rules:
    # 允许 web-frontend 访问 api-server
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/web-frontend"
      to:
        - operation:
            methods: ["GET", "POST"]
            paths: ["/api/*"]
    # 拒绝其他所有访问
  action: ALLOW
---
# 更严格的 AuthorizationPolicy（白名单模式）
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-all-by-default
  namespace: production
spec:
  {}    # 空规则 = 拒绝所有请求
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-specific
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  rules:
    - from:
        - source:
            namespaces: ["production"]
            principals: ["cluster.local/ns/production/sa/web-frontend"]

# 检查 mTLS 状态
istioctl authn check -n production

# 查看服务间的 mTLS 连接状态
kubectl exec -n production web-api-xxx -c istio-proxy -- \
  curl -s localhost:15000/clusters | grep -i "mtls"

# 查看证书信息
kubectl exec -n production web-api-xxx -c istio-proxy -- \
  openssl s_client -showcerts -connect localhost:15001

JWT 认证

# JWT 认证策略
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  jwtRules:
    - issuer: "https://auth.example.com/realms/production"
      jwksUri: "https://auth.example.com/realms/production/protocol/openid-connect/certs"
      forwardOriginalToken: true
      # 从 header 中提取 token
      fromHeaders:
        - name: Authorization
          prefix: "Bearer "
---
# 结合 AuthorizationPolicy 要求认证
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  rules:
    - from:
        - source:
            requestPrincipals: ["*"]
      to:
        - operation:
            notPaths: ["/health", "/metrics"]

Istio Gateway 配置

# Gateway 定义入口点
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: main-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*.example.com"
      tls:
        httpsRedirect: true     # 自动跳转 HTTPS
    - port:
        number: 443
        name: https
        protocol: HTTPS
      hosts:
        - "*.example.com"
      tls:
        mode: SIMPLE
        credentialName: example-com-tls   # Kubernetes Secret 名称
---
# VirtualService 绑定 Gateway
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: web-routing
  namespace: production
spec:
  hosts:
    - "www.example.com"
  gateways:
    - istio-system/main-gateway
  http:
    - route:
        - destination:
            host: web-frontend
            port:
              number: 8080

可观测性配置

# 启用 Prometheus 自动采集 Istio 指标
# Istio 默认会自动采集指标，以下为自定义指标配置
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  metrics:
    - providers:
        - name: prometheus
      overrides:
        - match:
            metric: REQUEST_COUNT
          tagOverrides:
            source_principal:
              operation: REMOVE
            destination_principal:
              operation: REMOVE
---
# 启用访问日志
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: access-logging
  namespace: istio-system
spec:
  accessLogging:
    - providers:
        - name: stdout
      filter:
        expression: 'response.code >= 400 || duration > "5s"'

# 使用 Kiali 查看服务拓扑
kubectl port-forward -n istio-system svc/kiali 20001:20001
# 浏览器访问 http://localhost:20001

# 使用 Jaeger 查看分布式追踪
kubectl port-forward -n istio-system svc/tracing 16686:80
# 浏览器访问 http://localhost:16686

# 使用 Grafana 查看 Istio 仪表盘
kubectl port-forward -n istio-system svc/grafana 3000:80
# 浏览器访问 http://localhost:3000

Sidecar 资源优化

# 通过 Sidecar CRD 限制 Sidecar 能感知的服务
# 减少不必要的配置下发，降低资源消耗
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
  name: web-api-sidecar
  namespace: production
spec:
  egress:
    - hosts:
        - "production/*"           # 允许访问本命名空间的所有服务
        - "istio-system/*"         # 允许访问 istio-system 的服务
        - "kube-system/*"          # 允许访问 kube-system 的服务
  # 不在此列表中的服务，Sidecar 不会为其生成路由

# 通过 Pod 注解控制 Sidecar 资源
apiVersion: v1
kind: Pod
metadata:
  name: web-api
  annotations:
    sidecar.istio.io/proxyCPU: "100m"
    sidecar.istio.io/proxyMemory: "128Mi"
    sidecar.istio.io/proxyCPULimit: "500m"
    sidecar.istio.io/proxyMemoryLimit: "512Mi"
    # 控制日志级别
    sidecar.istio.io/logLevel: "warning"
    # 排除特定端口的流量拦截
    traffic.sidecar.istio.io/includeInboundPorts: "8080,8443"
    traffic.sidecar.istio.io/includeOutboundIPRanges: "10.0.0.0/8"
spec:
  containers:
    - name: app
      image: web-api:latest

优点

1.应用无侵入 — 流量管理、安全、可观测性通过 Sidecar 实现，应用代码无需修改
2.流量治理能力强 — Canary、A/B、流量镜像、故障注入等能力远超原生 K8s Service
3.零信任安全 — 自动 mTLS 加密服务间通信，支持细粒度的 AuthorizationPolicy
4.可观测性开箱即用 — 自动生成分布式追踪和指标，无需修改应用接入 SDK
5.多集群统一管理 — 跨集群流量管理和安全策略统一配置
6.标准化 — 通过 CRD 管理，配置即代码，易于版本控制和审计

缺点

1.性能开销 — Sidecar 代理增加约 2-5ms 延迟和额外内存/CPU 消耗
2.运维复杂度高 — Istio 组件（istiod、Envoy filter、Policy）排查链路长
3.学习曲线陡峭 — VirtualService、DestinationRule、AuthorizationPolicy 等概念众多
4.Sidecar 管理负担 — Sidecar 升级需要重启所有 Pod，大规模集群影响面大
5.资源消耗 — 每个 Pod 多一个 Sidecar 容器，集群整体资源消耗增加
6.版本兼容性 — Istio 版本升级可能与 Kubernetes 版本存在兼容性问题

总结

Istio 为大规模微服务提供了强大的流量治理和安全能力，但也带来了显著的复杂度和性能开销。引入 Istio 前应评估是否真正需要其高级功能，小型项目使用 K8s 原生 Service + Ingress 可能更合适。推荐分阶段引入：先在非核心服务上试点流量管理和可观测性，验证稳定后再推广到核心服务，最后启用 mTLS 和安全策略。

关键知识点

Istio 的 Sidecar 注入通过 Mutating Admission Webhook 实现，可按命名空间开关
VirtualService 定义路由规则，DestinationRule 定义目标策略（负载均衡、熔断、连接池）
mTLS 模式：STRICT（仅允许 TLS）、PERMISSIVE（同时允许明文和 TLS）
istiod 是控制面组件，合并了 Pilot、Citadel、Galley 的功能
Envoy 是数据面的核心，支持 L4/L7 流量管理和可观测性
Sidecar CRD 可以限制 Sidecar 感知的服务范围，降低配置规模
AuthorizationPolicy 支持白名单（ALLOW）和黑名单（DENY）两种模式
Gateway + VirtualService 组合可以替代传统的 Ingress Controller

项目落地视角

从流量管理（Canary 发布）和可观测性开始引入，逐步启用 mTLS 和 AuthorizationPolicy
为 Sidecar 设置合理的资源请求和限制，避免 Sidecar 消耗过多资源影响业务容器
建立 Istio 配置的 Code Review 流程，VirtualService 和 DestinationRule 的错误配置可能导致全局故障
使用 Kiali 作为日常巡检工具，及时发现异常的服务调用关系
将 Istio 配置文件纳入 Git 仓库管理，通过 CI/CD 流程进行变更

常见误区

一上来就启用所有 Istio 功能，导致集群复杂度急剧上升
VirtualService 和 K8s Ingress 混用导致流量规则冲突
Sidecar 资源未限制，在大流量下 Envoy 消耗过多 CPU/内存
在生产环境直接使用 STRICT mTLS，未做 Permissive 过渡
忽略 Sidecar CRD 配置，导致每个 Sidecar 感知全网格服务，配置下发膨胀
故障注入未在测试环境验证就应用到生产环境
AuthorizationPolicy 规则顺序不当，导致意料之外的访问拒绝

进阶路线

学习 Ambient Mesh（无 Sidecar 模式），降低 Istio 的性能开销和运维复杂度
掌握 Istio 的 Wasm 插件机制，自定义 L7 流量处理逻辑
了解 Cilium Service Mesh 作为 Istio 的替代方案
研究 Istio 多集群和跨网格联邦的配置方案
学习使用 istioctl analyze 进行配置自动化检查

适用场景

大规模微服务架构需要精细的流量治理（Canary、流量镜像、熔断）
金融、政务等安全要求高的场景需要服务间 mTLS 加密
多集群、多网格的跨环境流量管理
需要统一可观测性的微服务体系（分布式追踪、指标、日志）
需要精细化访问控制的零信任网络架构

落地建议

先在测试环境完整验证 Istio 配置，再逐步推广到生产
使用 Kiali 可视化服务拓扑，辅助理解服务间依赖关系
为 Istio 配置独立的监控和告警，监控 Sidecar 的资源使用和代理状态
建立 Istio 配置的变更审批流程，关键变更需要灰度发布
定期使用 istioctl analyze 检查配置问题
为 istiod 配置 HPA 自动伸缩，避免控制面成为瓶颈

排错清单

流量规则不生效：检查 VirtualService 的 hosts 是否与 Service 名称匹配
Sidecar 未注入：确认命名空间的 istio-injection=enabled 标签和 Pod 的 injection 策略
mTLS 连接失败：检查 DestinationRule 的 trafficPolicy.tls 模式和对端 STRICT 设置
请求 503 错误：检查 DestinationRule 的 subset 是否存在对应的 Deployment 标签
Envoy 崩溃：查看 Sidecar 容器日志，检查内存限制是否过小
Gateway 无法访问：检查 Gateway selector 是否与 ingressgateway Pod 标签匹配
istiod 配置下发慢：检查 istiod 资源使用情况和 xDS 推送延迟

# 常用排错命令
# 检查代理状态
istioctl proxy-status

# 检查配置
istioctl analyze -n production

# 查看代理配置
istioctl pc cluster deploy/web-api -n production
istioctl pc route deploy/web-api -n production
istioctl pc listener deploy/web-api -n production

# 查看代理日志
istioctl logs deploy/web-api.istio-proxy -n production

复盘问题

引入 Istio 后，服务间的延迟增加了多少？是否在可接受范围内？
Istio 的配置管理是否有完善的 Code Review 和灰度发布机制？
团队是否具备独立排查 Istio 问题的能力？是否过度依赖外部支持？
Sidecar 的资源消耗是否在预期范围内？是否有优化空间？
mTLS 的覆盖率是否达到 100%？是否有遗留服务仍在使用 PERMISSIVE 模式？