Elasticsearch

SunnyFan大约 12 分钟约 3696 字

Elasticsearch

简介

Elasticsearch 是一个分布式搜索和分析引擎，基于 Lucene 构建，最擅长处理全文搜索、结构化过滤、聚合分析和日志检索。它在业务系统里最常见的两个角色是：做搜索引擎 和 做日志分析 / 可观测性存储。要用好 Elasticsearch，重点不是把数据塞进去，而是理解索引、Mapping、倒排索引、分词器、聚合和分片副本这些核心能力。

特点

1.全文搜索能力强 — 基于倒排索引，适合关键词检索和相关性排序
2.支持结构化分析 — 过滤、排序、聚合、统计都很方便
3.天然分布式 — 支持分片、副本、集群扩展和高可用
4.REST API 友好 — 所有操作几乎都能通过 HTTP 完成
5.生态成熟 — 常与 Kibana、Filebeat、Logstash 搭配使用

实现

核心概念与基础认知

Elasticsearch 与关系型数据库的粗略类比：
- Index：类似数据库
- Document：类似一行记录，但通常是 JSON 文档
- Field：类似列
- Mapping：类似表结构定义
- Shard：索引分片
- Replica：分片副本

几个关键概念：
- text：参与全文搜索，会分词
- keyword：不分词，适合精确过滤和聚合
- _source：原始文档内容
- inverted index：倒排索引，是全文检索的基础

# 查看 ES 集群健康状态
curl http://localhost:9200/_cluster/health?pretty

# 查看节点
curl http://localhost:9200/_cat/nodes?v

# 查看索引列表
curl http://localhost:9200/_cat/indices?v

安装与基础配置（单机示意）

# 下载（示意版本）
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.13.4-linux-x86_64.tar.gz

tar -xzf elasticsearch-8.13.4-linux-x86_64.tar.gz -C /usr/local
mv /usr/local/elasticsearch-8.13.4 /usr/local/elasticsearch

# /usr/local/elasticsearch/config/elasticsearch.yml
cluster.name: sunnyfan-es
node.name: node-1
path.data: /usr/local/elasticsearch/data
path.logs: /usr/local/elasticsearch/logs
network.host: 0.0.0.0
http.port: 9200

discovery.type: single-node

# JVM 堆大小（示意）
# /usr/local/elasticsearch/config/jvm.options.d/custom.options
-Xms1g
-Xmx1g

# 启动前系统参数
echo 'vm.max_map_count=262144' >> /etc/sysctl.conf
sysctl -p

# Elasticsearch 不能用 root 运行
useradd esuser
chown -R esuser:esuser /usr/local/elasticsearch
su - esuser
/usr/local/elasticsearch/bin/elasticsearch -d

索引、Mapping 与文档写入

# 创建索引并定义字段类型
curl -X PUT "http://localhost:9200/products" \
  -H 'Content-Type: application/json' \
  -d '
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "title":       { "type": "text" },
      "description": { "type": "text" },
      "price":       { "type": "double" },
      "category":    { "type": "keyword" },
      "tags":        { "type": "keyword" },
      "createdAt":   { "type": "date" }
    }
  }
}'

# 写入文档
curl -X POST "http://localhost:9200/products/_doc/1" \
  -H 'Content-Type: application/json' \
  -d '
{
  "title": "Docker 实战指南",
  "description": "适合后端和运维人员的 Docker 入门与实践",
  "price": 69.9,
  "category": "book",
  "tags": ["docker", "devops"],
  "createdAt": "2026-04-12T09:00:00Z"
}'

# 查看单条文档
curl "http://localhost:9200/products/_doc/1?pretty"

字段类型选择建议：
- 需要全文检索：text
- 需要筛选/聚合/排序：keyword
- 数值统计：integer / long / double
- 时间序列：date

查询、过滤与聚合

# 全文搜索
curl -X GET "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "query": {
    "match": {
      "title": "Docker"
    }
  }
}'

# 精确过滤 + 排序
curl -X GET "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "category": "book" } },
        { "range": { "price": { "lte": 100 } } }
      ]
    }
  },
  "sort": [
    { "price": "asc" }
  ]
}'

# 聚合统计：按 category 分组计数
curl -X GET "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": { "field": "category" }
    }
  }
}'

# 高亮查询
curl -X GET "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "query": {
    "match": {
      "description": "运维"
    }
  },
  "highlight": {
    "fields": {
      "description": {}
    }
  }
}'

高级查询技巧

# 多字段查询（multi_match）
curl -X GET "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "query": {
    "multi_match": {
      "query": "Docker 运维",
      "fields": ["title^3", "description"],
      "type": "best_fields"
    }
  }
}'
# title^3 表示 title 字段权重是 description 的 3 倍

# 布尔组合查询（bool query）
curl -X GET "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Docker" } }
      ],
      "should": [
        { "match": { "description": "运维" } },
        { "term": { "tags": "devops" } }
      ],
      "must_not": [
        { "range": { "price": { "lt": 10 } } }
      ],
      "filter": [
        { "term": { "category": "book" } },
        { "range": { "createdAt": { "gte": "2026-01-01" } } }
      ],
      "minimum_should_match": 1
    }
  }
}'

# 嵌套查询（nested）
# 适用于对象数组中的精确条件查询
curl -X PUT "http://localhost:9200/orders" \
  -H 'Content-Type: application/json' \
  -d '
{
  "mappings": {
    "properties": {
      "orderId": { "type": "keyword" },
      "items": {
        "type": "nested",
        "properties": {
          "productId": { "type": "keyword" },
          "quantity":  { "type": "integer" },
          "price":     { "type": "double" }
        }
      }
    }
  }
}'

# 在 nested 字段中查询
curl -X GET "http://localhost:9200/orders/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "query": {
    "nested": {
      "path": "items",
      "query": {
        "bool": {
          "must": [
            { "match": { "items.productId": "P001" } },
            { "range": { "items.price": { "gte": 100 } } }
          ]
        }
      }
    }
  }
}'

聚合分析进阶

# 多级聚合：按 category 分组，再统计每个组的平均价格
curl -X GET "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": { "field": "category" },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } },
        "max_price": { "max": { "field": "price" } },
        "price_range": {
          "range": {
            "field": "price",
            "ranges": [
              { "key": "cheap", "to": 50 },
              { "key": "mid", "from": 50, "to": 200 },
              { "key": "expensive", "from": 200 }
            ]
          }
        }
      }
    }
  }
}'

# 日期直方图聚合（日志场景常用）
curl -X GET "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "size": 0,
  "aggs": {
    "by_month": {
      "date_histogram": {
        "field": "createdAt",
        "calendar_interval": "month",
        "format": "yyyy-MM"
      },
      "aggs": {
        "count_per_month": { "value_count": { "field": "price" } }
      }
    }
  }
}'

# 搜索建议（completion suggester）
# 适用于自动补全场景
curl -X PUT "http://localhost:9200/products" \
  -H 'Content-Type: application/json' \
  -d '
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "title_suggest": {
        "type": "completion",
        "analyzer": "standard"
      }
    }
  }
}'

# 写入带 suggest 字段的文档
curl -X POST "http://localhost:9200/products/_doc/1" \
  -H 'Content-Type: application/json' \
  -d '{
    "title": "Docker 实战指南",
    "title_suggest": {
      "input": ["docker", "docker实战", "docker指南"],
      "weight": 10
    }
  }'

# 查询搜索建议
curl -X POST "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '{
    "suggest": {
      "title-suggest": {
        "prefix": "dock",
        "completion": {
          "field": "title_suggest",
          "size": 5
        }
      }
    }
  }'

索引模板与动态 Mapping

# 创建索引模板（新索引自动应用）
curl -X PUT "http://localhost:9200/_index_template/product-template" \
  -H 'Content-Type: application/json' \
  -d '
{
  "index_patterns": ["product-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "refresh_interval": "5s",
      "analysis": {
        "analyzer": {
          "ik_max_word_analyzer": {
            "type": "custom",
            "tokenizer": "ik_max_word"
          }
        }
      }
    },
    "mappings": {
      "properties": {
        "title":       { "type": "text", "analyzer": "ik_max_word_analyzer" },
        "description": { "type": "text", "analyzer": "ik_max_word_analyzer" },
        "price":       { "type": "double" },
        "category":    { "type": "keyword" },
        "tags":        { "type": "keyword" },
        "status":      { "type": "keyword" },
        "createdAt":   { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||epoch_millis" }
      }
    }
  }
}'

# 动态 Mapping 控制
curl -X PUT "http://localhost:9200/_index_template/strict-template" \
  -H 'Content-Type: application/json' \
  -d '
{
  "index_patterns": ["strict-*"],
  "template": {
    "mappings": {
      "dynamic": "strict",
      "dynamic_templates": [
        {
          "strings_as_keyword": {
            "match_mapping_type": "string",
            "mapping": { "type": "keyword" }
          }
        }
      ]
    }
  }
}'
# dynamic: strict — 未预定义字段写入直接报错
# dynamic: false  — 未预定义字段写入不报错，但不建索引
# dynamic: true   — 自动推断类型（默认）

# 查看已有模板
curl "http://localhost:9200/_index_template/product-template?pretty"

索引管理操作

# 打开/关闭索引（关闭后不占用内存，但无法读写）
curl -X POST "http://localhost:9200/products/_close"
curl -X POST "http://localhost:9200/products/_open"

# 设置索引只读
curl -X PUT "http://localhost:9200/products/_settings" \
  -H 'Content-Type: application/json' \
  -d '{ "index.blocks.write": true }'

# 解除只读
curl -X PUT "http://localhost:9200/products/_settings" \
  -H 'Content-Type: application/json' \
  -d '{ "index.blocks.write": null }'

# 强制刷新（使写入立即可见，性能开销大，仅用于调试）
curl -X POST "http://localhost:9200/products/_refresh"

# 清除缓存
curl -X POST "http://localhost:9200/products/_cache/clear"

# 查看索引段信息
curl "http://localhost:9200/products/_segments?pretty"

# 合并段（减少段数量，提升查询性能，但消耗 I/O）
curl -X POST "http://localhost:9200/products/_forcemerge?max_num_segments=1"

# 删除索引
curl -X DELETE "http://localhost:9200/products"

# 按通配符删除索引（谨慎操作）
curl -X DELETE "http://localhost:9200/log-2025-*"

# 重新索引（数据迁移、Mapping 变更时使用）
curl -X POST "http://localhost:9200/_reindex" \
  -H 'Content-Type: application/json' \
  -d '
{
  "source": { "index": "products" },
  "dest": { "index": "products-v2" }
}'

中文分词器配置（IK）

# 安装 IK 分词器（版本需与 ES 版本一致）
cd /usr/local/elasticsearch
./bin/elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-ik/8.13.4

# 重启 ES
# 验证安装
curl -X POST "http://localhost:9200/_analyze" \
  -H 'Content-Type: application/json' \
  -d '{
    "analyzer": "ik_max_word",
    "text": "中华人民共和国国歌"
  }'

# ik_max_word — 最细粒度分词（适合索引时使用）
# ik_smart   — 最粗粒度分词（适合搜索时使用）

# 在索引 Mapping 中使用 IK 分词器
curl -X PUT "http://localhost:9200/articles" \
  -H 'Content-Type: application/json' \
  -d '
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      },
      "content": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      }
    }
  }
}'

# 自定义 IK 词库
# 在 config/analysis-ik/IKAnalyzer.cfg.xml 中配置远程词典
# 或在 config/analysis-ik/ 目录下创建 custom.dic 文件
# 每行一个词，UTF-8 编码

常见治理点：分词、日志、生命周期

中文搜索常见问题：
- 默认英文分词不适合中文
- 中文常常要额外使用 IK / SmartCN 等分词器
- 搜索效果不好时，先检查分词结果，而不是先怀疑数据没写进去

# 日志排查常用接口
curl http://localhost:9200/_cluster/health?pretty
curl http://localhost:9200/_cat/shards?v
curl http://localhost:9200/_cat/allocation?v

{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "30gb",
            "max_age": "1d"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

日志场景中，ES 常常要结合：
- Filebeat：采集
- Logstash：清洗转换（可选）
- Elasticsearch：存储检索
- Kibana：查询与可视化

集群部署与分片管理

# 集群配置（三节点示例）
# node-1 的 elasticsearch.yml
cluster.name: sunnyfan-es
node.name: node-1
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["192.168.1.101", "192.168.1.102", "192.168.1.103"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

# 分配节点角色
node.roles: [ data, master ]
# data_hot    — 热数据节点（SSD）
# data_warm   — 温数据节点（HDD）
# data_cold   — 冷数据节点（低频访问）
# master      — 主节点（不存数据，只管理集群状态）
# ingest      — 数据预处理节点
# ml          — 机器学习节点

# JVM 堆大小建议
# 堆大小不超过物理内存的 50%
# 堆大小不超过 30GB（压缩指针优化）
# 堆大小不超过总内存 - 系统缓存需要
# 示例：32GB 内存的服务器，建议 -Xms16g -Xmx16g

# 分片管理
# 查看分片分布
curl "http://localhost:9200/_cat/shards?v"
curl "http://localhost:9200/_cat/shards/products?v&h=index,shard,prirep,state,docs,store,node"

# 手动移动分片
curl -X POST "http://localhost:9200/_cluster/reroute" \
  -H 'Content-Type: application/json' \
  -d '
{
  "commands": [
    {
      "move": {
        "index": "products",
        "shard": 0,
        "from_node": "node-1",
        "to_node": "node-2"
      }
    }
  ]
}'

# 分片分配过滤（节点维护时使用）
curl -X PUT "http://localhost:9200/_cluster/settings" \
  -H 'Content-Type: application/json' \
  -d '{
    "transient": {
      "cluster.routing.allocation.exclude._name": "node-3"
    }
  }'
# 恢复分配
curl -X PUT "http://localhost:9200/_cluster/settings" \
  -H 'Content-Type: application/json' \
  -d '{
    "transient": {
      "cluster.routing.allocation.exclude._name": null
    }
  }'

# 分片数量规划建议
# 单分片建议数据量：10-50GB
# 总分片数 = 每个节点能承载的分片数 × 节点数
# 一般单节点不超过 20 个分片

性能优化

# 1. 写入优化
# 批量写入（bulk API）
curl -X POST "http://localhost:9200/_bulk" \
  -H 'Content-Type: application/json' \
  -d '
{"index": {"_index": "products", "_id": "1"}}
{"title": "商品1", "price": 99.9}
{"index": {"_index": "products", "_id": "2"}}
{"title": "商品2", "price": 199.9}
'

# 批量建议：每批 1000-5000 条文档，大小 5-15MB
# 写入时关闭副本（索引完成后再恢复）
curl -X PUT "http://localhost:9200/products/_settings" \
  -H 'Content-Type: application/json' \
  -d '{ "number_of_replicas": 0 }'

# 增大 refresh_interval 减少段合并开销
curl -X PUT "http://localhost:9200/products/_settings" \
  -H 'Content-Type: application/json' \
  -d '{ "refresh_interval": "30s" }'

# 写入完成后恢复
curl -X PUT "http://localhost:9200/products/_settings" \
  -H 'Content-Type: application/json' \
  -d '{
    "number_of_replicas": 1,
    "refresh_interval": "1s"
  }'

# 2. 查询优化
# 使用 _source 过滤减少返回数据量
curl -X GET "http://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' \
  -d '
{
  "_source": ["title", "price"],
  "query": { "match_all": {} },
  "size": 20
}'

# 使用 filter 代替 query 进行精确过滤（filter 不评分，可缓存）
# 使用 preference 参数避免分页抖动
curl -X GET "http://localhost:9200/products/_search?preference=_local"
# _local   — 优先使用本地分片
# _primary — 只查主分片

# 3. 磁盘与 I/O 优化
# 使用 SSD 存储
# 确保 ES 数据目录独占磁盘（避免和日志混用）
# 设置合适的磁盘水位线
curl -X PUT "http://localhost:9200/_cluster/settings" \
  -H 'Content-Type: application/json' \
  -d '{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%"
  }
}'

# 4. JVM 优化
# -Xms 和 -Xmx 设置相同值，避免运行时动态调整
# 不要超过 30GB（JVM 压缩指针优化阈值）
# 使用 G1GC 垃圾回收器（ES 8.x 默认）

ILM 索引生命周期管理

# 创建 ILM 策略
curl -X PUT "http://localhost:9200/_ilm/policy/logs-policy" \
  -H 'Content-Type: application/json' \
  -d '
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "30gb",
            "max_age": "1d",
            "max_docs": 1000000
          },
          "set_priority": { "priority": 100 }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": { "number_of_shards": 1 },
          "forcemerge": { "max_num_segments": 1 },
          "set_priority": { "priority": 50 }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "set_priority": { "priority": 0 },
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}'

# 将 ILM 策略绑定到索引模板
curl -X PUT "http://localhost:9200/_index_template/logs-template" \
  -H 'Content-Type: application/json' \
  -d '
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-policy",
      "index.lifecycle.rollover_alias": "logs"
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "level":      { "type": "keyword" },
        "message":    { "type": "text" },
        "service":    { "type": "keyword" }
      }
    }
  }
}'

# 查看 ILM 策略状态
curl "http://localhost:9200/_ilm/policy/logs-policy?pretty"
curl "http://localhost:9200/_ilm/explain/logs-*?only_managed=true&pretty"

常用监控与诊断命令

# 集群健康
curl "http://localhost:9200/_cluster/health?pretty"
curl "http://localhost:9200/_cluster/health?level=shards&pretty"

# 节点状态
curl "http://localhost:9200/_cat/nodes?v&h=name,ip,heap.percent,ram.percent,disk.used_percent,node.role,master"
curl "http://localhost:9200/_nodes/stats?pretty"

# 索引统计
curl "http://localhost:9200/_cat/indices?v&s=store.size:desc"
curl "http://localhost:9200/_stats/store,docs?pretty"

# 查看挂起任务
curl "http://localhost:9200/_cluster/pending_tasks?pretty"

# 查看慢查询日志（需要在 elasticsearch.yml 中配置）
# index.search.slowlog.threshold.query.warn: 10s
# index.search.slowlog.threshold.query.info: 5s
# index.indexing.slowlog.threshold.index.warn: 10s

# 查看线程池状态
curl "http://localhost:9200/_cat/thread_pool?v&h=name,active,queue,rejected,completed"

# 查看分片分配失败原因
curl "http://localhost:9200/_cluster/allocation/explain?pretty"

# 查看恢复状态（节点重启后）
curl "http://localhost:9200/_cat/recovery?v&active_only=true"