Elasticsearch
大约 12 分钟约 3696 字
Elasticsearch
简介
Elasticsearch 是一个分布式搜索和分析引擎,基于 Lucene 构建,最擅长处理全文搜索、结构化过滤、聚合分析和日志检索。它在业务系统里最常见的两个角色是:做搜索引擎 和 做日志分析 / 可观测性存储。要用好 Elasticsearch,重点不是把数据塞进去,而是理解索引、Mapping、倒排索引、分词器、聚合和分片副本这些核心能力。
特点
实现
核心概念与基础认知
Elasticsearch 与关系型数据库的粗略类比:
- Index:类似数据库
- Document:类似一行记录,但通常是 JSON 文档
- Field:类似列
- Mapping:类似表结构定义
- Shard:索引分片
- Replica:分片副本几个关键概念:
- text:参与全文搜索,会分词
- keyword:不分词,适合精确过滤和聚合
- _source:原始文档内容
- inverted index:倒排索引,是全文检索的基础# 查看 ES 集群健康状态
curl http://localhost:9200/_cluster/health?pretty
# 查看节点
curl http://localhost:9200/_cat/nodes?v
# 查看索引列表
curl http://localhost:9200/_cat/indices?v安装与基础配置(单机示意)
# 下载(示意版本)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.13.4-linux-x86_64.tar.gz
tar -xzf elasticsearch-8.13.4-linux-x86_64.tar.gz -C /usr/local
mv /usr/local/elasticsearch-8.13.4 /usr/local/elasticsearch# /usr/local/elasticsearch/config/elasticsearch.yml
cluster.name: sunnyfan-es
node.name: node-1
path.data: /usr/local/elasticsearch/data
path.logs: /usr/local/elasticsearch/logs
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node# JVM 堆大小(示意)
# /usr/local/elasticsearch/config/jvm.options.d/custom.options
-Xms1g
-Xmx1g# 启动前系统参数
echo 'vm.max_map_count=262144' >> /etc/sysctl.conf
sysctl -p# Elasticsearch 不能用 root 运行
useradd esuser
chown -R esuser:esuser /usr/local/elasticsearch
su - esuser
/usr/local/elasticsearch/bin/elasticsearch -d索引、Mapping 与文档写入
# 创建索引并定义字段类型
curl -X PUT "http://localhost:9200/products" \
-H 'Content-Type: application/json' \
-d '
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": { "type": "text" },
"description": { "type": "text" },
"price": { "type": "double" },
"category": { "type": "keyword" },
"tags": { "type": "keyword" },
"createdAt": { "type": "date" }
}
}
}'# 写入文档
curl -X POST "http://localhost:9200/products/_doc/1" \
-H 'Content-Type: application/json' \
-d '
{
"title": "Docker 实战指南",
"description": "适合后端和运维人员的 Docker 入门与实践",
"price": 69.9,
"category": "book",
"tags": ["docker", "devops"],
"createdAt": "2026-04-12T09:00:00Z"
}'# 查看单条文档
curl "http://localhost:9200/products/_doc/1?pretty"字段类型选择建议:
- 需要全文检索:text
- 需要筛选/聚合/排序:keyword
- 数值统计:integer / long / double
- 时间序列:date查询、过滤与聚合
# 全文搜索
curl -X GET "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '
{
"query": {
"match": {
"title": "Docker"
}
}
}'# 精确过滤 + 排序
curl -X GET "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '
{
"query": {
"bool": {
"filter": [
{ "term": { "category": "book" } },
{ "range": { "price": { "lte": 100 } } }
]
}
},
"sort": [
{ "price": "asc" }
]
}'# 聚合统计:按 category 分组计数
curl -X GET "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '
{
"size": 0,
"aggs": {
"by_category": {
"terms": { "field": "category" }
}
}
}'# 高亮查询
curl -X GET "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '
{
"query": {
"match": {
"description": "运维"
}
},
"highlight": {
"fields": {
"description": {}
}
}
}'高级查询技巧
# 多字段查询(multi_match)
curl -X GET "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '
{
"query": {
"multi_match": {
"query": "Docker 运维",
"fields": ["title^3", "description"],
"type": "best_fields"
}
}
}'
# title^3 表示 title 字段权重是 description 的 3 倍
# 布尔组合查询(bool query)
curl -X GET "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Docker" } }
],
"should": [
{ "match": { "description": "运维" } },
{ "term": { "tags": "devops" } }
],
"must_not": [
{ "range": { "price": { "lt": 10 } } }
],
"filter": [
{ "term": { "category": "book" } },
{ "range": { "createdAt": { "gte": "2026-01-01" } } }
],
"minimum_should_match": 1
}
}
}'
# 嵌套查询(nested)
# 适用于对象数组中的精确条件查询
curl -X PUT "http://localhost:9200/orders" \
-H 'Content-Type: application/json' \
-d '
{
"mappings": {
"properties": {
"orderId": { "type": "keyword" },
"items": {
"type": "nested",
"properties": {
"productId": { "type": "keyword" },
"quantity": { "type": "integer" },
"price": { "type": "double" }
}
}
}
}
}'
# 在 nested 字段中查询
curl -X GET "http://localhost:9200/orders/_search" \
-H 'Content-Type: application/json' \
-d '
{
"query": {
"nested": {
"path": "items",
"query": {
"bool": {
"must": [
{ "match": { "items.productId": "P001" } },
{ "range": { "items.price": { "gte": 100 } } }
]
}
}
}
}
}'聚合分析进阶
# 多级聚合:按 category 分组,再统计每个组的平均价格
curl -X GET "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '
{
"size": 0,
"aggs": {
"by_category": {
"terms": { "field": "category" },
"aggs": {
"avg_price": { "avg": { "field": "price" } },
"max_price": { "max": { "field": "price" } },
"price_range": {
"range": {
"field": "price",
"ranges": [
{ "key": "cheap", "to": 50 },
{ "key": "mid", "from": 50, "to": 200 },
{ "key": "expensive", "from": 200 }
]
}
}
}
}
}
}'
# 日期直方图聚合(日志场景常用)
curl -X GET "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '
{
"size": 0,
"aggs": {
"by_month": {
"date_histogram": {
"field": "createdAt",
"calendar_interval": "month",
"format": "yyyy-MM"
},
"aggs": {
"count_per_month": { "value_count": { "field": "price" } }
}
}
}
}'
# 搜索建议(completion suggester)
# 适用于自动补全场景
curl -X PUT "http://localhost:9200/products" \
-H 'Content-Type: application/json' \
-d '
{
"mappings": {
"properties": {
"title": { "type": "text" },
"title_suggest": {
"type": "completion",
"analyzer": "standard"
}
}
}
}'
# 写入带 suggest 字段的文档
curl -X POST "http://localhost:9200/products/_doc/1" \
-H 'Content-Type: application/json' \
-d '{
"title": "Docker 实战指南",
"title_suggest": {
"input": ["docker", "docker实战", "docker指南"],
"weight": 10
}
}'
# 查询搜索建议
curl -X POST "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '{
"suggest": {
"title-suggest": {
"prefix": "dock",
"completion": {
"field": "title_suggest",
"size": 5
}
}
}
}'索引模板与动态 Mapping
# 创建索引模板(新索引自动应用)
curl -X PUT "http://localhost:9200/_index_template/product-template" \
-H 'Content-Type: application/json' \
-d '
{
"index_patterns": ["product-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "5s",
"analysis": {
"analyzer": {
"ik_max_word_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word"
}
}
}
},
"mappings": {
"properties": {
"title": { "type": "text", "analyzer": "ik_max_word_analyzer" },
"description": { "type": "text", "analyzer": "ik_max_word_analyzer" },
"price": { "type": "double" },
"category": { "type": "keyword" },
"tags": { "type": "keyword" },
"status": { "type": "keyword" },
"createdAt": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||epoch_millis" }
}
}
}
}'
# 动态 Mapping 控制
curl -X PUT "http://localhost:9200/_index_template/strict-template" \
-H 'Content-Type: application/json' \
-d '
{
"index_patterns": ["strict-*"],
"template": {
"mappings": {
"dynamic": "strict",
"dynamic_templates": [
{
"strings_as_keyword": {
"match_mapping_type": "string",
"mapping": { "type": "keyword" }
}
}
]
}
}
}'
# dynamic: strict — 未预定义字段写入直接报错
# dynamic: false — 未预定义字段写入不报错,但不建索引
# dynamic: true — 自动推断类型(默认)
# 查看已有模板
curl "http://localhost:9200/_index_template/product-template?pretty"索引管理操作
# 打开/关闭索引(关闭后不占用内存,但无法读写)
curl -X POST "http://localhost:9200/products/_close"
curl -X POST "http://localhost:9200/products/_open"
# 设置索引只读
curl -X PUT "http://localhost:9200/products/_settings" \
-H 'Content-Type: application/json' \
-d '{ "index.blocks.write": true }'
# 解除只读
curl -X PUT "http://localhost:9200/products/_settings" \
-H 'Content-Type: application/json' \
-d '{ "index.blocks.write": null }'
# 强制刷新(使写入立即可见,性能开销大,仅用于调试)
curl -X POST "http://localhost:9200/products/_refresh"
# 清除缓存
curl -X POST "http://localhost:9200/products/_cache/clear"
# 查看索引段信息
curl "http://localhost:9200/products/_segments?pretty"
# 合并段(减少段数量,提升查询性能,但消耗 I/O)
curl -X POST "http://localhost:9200/products/_forcemerge?max_num_segments=1"
# 删除索引
curl -X DELETE "http://localhost:9200/products"
# 按通配符删除索引(谨慎操作)
curl -X DELETE "http://localhost:9200/log-2025-*"
# 重新索引(数据迁移、Mapping 变更时使用)
curl -X POST "http://localhost:9200/_reindex" \
-H 'Content-Type: application/json' \
-d '
{
"source": { "index": "products" },
"dest": { "index": "products-v2" }
}'中文分词器配置(IK)
# 安装 IK 分词器(版本需与 ES 版本一致)
cd /usr/local/elasticsearch
./bin/elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-ik/8.13.4
# 重启 ES
# 验证安装
curl -X POST "http://localhost:9200/_analyze" \
-H 'Content-Type: application/json' \
-d '{
"analyzer": "ik_max_word",
"text": "中华人民共和国国歌"
}'
# ik_max_word — 最细粒度分词(适合索引时使用)
# ik_smart — 最粗粒度分词(适合搜索时使用)
# 在索引 Mapping 中使用 IK 分词器
curl -X PUT "http://localhost:9200/articles" \
-H 'Content-Type: application/json' \
-d '
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
}
}'
# 自定义 IK 词库
# 在 config/analysis-ik/IKAnalyzer.cfg.xml 中配置远程词典
# 或在 config/analysis-ik/ 目录下创建 custom.dic 文件
# 每行一个词,UTF-8 编码常见治理点:分词、日志、生命周期
中文搜索常见问题:
- 默认英文分词不适合中文
- 中文常常要额外使用 IK / SmartCN 等分词器
- 搜索效果不好时,先检查分词结果,而不是先怀疑数据没写进去# 日志排查常用接口
curl http://localhost:9200/_cluster/health?pretty
curl http://localhost:9200/_cat/shards?v
curl http://localhost:9200/_cat/allocation?v{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "30gb",
"max_age": "1d"
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}日志场景中,ES 常常要结合:
- Filebeat:采集
- Logstash:清洗转换(可选)
- Elasticsearch:存储检索
- Kibana:查询与可视化集群部署与分片管理
# 集群配置(三节点示例)
# node-1 的 elasticsearch.yml
cluster.name: sunnyfan-es
node.name: node-1
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["192.168.1.101", "192.168.1.102", "192.168.1.103"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
# 分配节点角色
node.roles: [ data, master ]
# data_hot — 热数据节点(SSD)
# data_warm — 温数据节点(HDD)
# data_cold — 冷数据节点(低频访问)
# master — 主节点(不存数据,只管理集群状态)
# ingest — 数据预处理节点
# ml — 机器学习节点
# JVM 堆大小建议
# 堆大小不超过物理内存的 50%
# 堆大小不超过 30GB(压缩指针优化)
# 堆大小不超过总内存 - 系统缓存需要
# 示例:32GB 内存的服务器,建议 -Xms16g -Xmx16g# 分片管理
# 查看分片分布
curl "http://localhost:9200/_cat/shards?v"
curl "http://localhost:9200/_cat/shards/products?v&h=index,shard,prirep,state,docs,store,node"
# 手动移动分片
curl -X POST "http://localhost:9200/_cluster/reroute" \
-H 'Content-Type: application/json' \
-d '
{
"commands": [
{
"move": {
"index": "products",
"shard": 0,
"from_node": "node-1",
"to_node": "node-2"
}
}
]
}'
# 分片分配过滤(节点维护时使用)
curl -X PUT "http://localhost:9200/_cluster/settings" \
-H 'Content-Type: application/json' \
-d '{
"transient": {
"cluster.routing.allocation.exclude._name": "node-3"
}
}'
# 恢复分配
curl -X PUT "http://localhost:9200/_cluster/settings" \
-H 'Content-Type: application/json' \
-d '{
"transient": {
"cluster.routing.allocation.exclude._name": null
}
}'
# 分片数量规划建议
# 单分片建议数据量:10-50GB
# 总分片数 = 每个节点能承载的分片数 × 节点数
# 一般单节点不超过 20 个分片性能优化
# 1. 写入优化
# 批量写入(bulk API)
curl -X POST "http://localhost:9200/_bulk" \
-H 'Content-Type: application/json' \
-d '
{"index": {"_index": "products", "_id": "1"}}
{"title": "商品1", "price": 99.9}
{"index": {"_index": "products", "_id": "2"}}
{"title": "商品2", "price": 199.9}
'
# 批量建议:每批 1000-5000 条文档,大小 5-15MB
# 写入时关闭副本(索引完成后再恢复)
curl -X PUT "http://localhost:9200/products/_settings" \
-H 'Content-Type: application/json' \
-d '{ "number_of_replicas": 0 }'
# 增大 refresh_interval 减少段合并开销
curl -X PUT "http://localhost:9200/products/_settings" \
-H 'Content-Type: application/json' \
-d '{ "refresh_interval": "30s" }'
# 写入完成后恢复
curl -X PUT "http://localhost:9200/products/_settings" \
-H 'Content-Type: application/json' \
-d '{
"number_of_replicas": 1,
"refresh_interval": "1s"
}'
# 2. 查询优化
# 使用 _source 过滤减少返回数据量
curl -X GET "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '
{
"_source": ["title", "price"],
"query": { "match_all": {} },
"size": 20
}'
# 使用 filter 代替 query 进行精确过滤(filter 不评分,可缓存)
# 使用 preference 参数避免分页抖动
curl -X GET "http://localhost:9200/products/_search?preference=_local"
# _local — 优先使用本地分片
# _primary — 只查主分片
# 3. 磁盘与 I/O 优化
# 使用 SSD 存储
# 确保 ES 数据目录独占磁盘(避免和日志混用)
# 设置合适的磁盘水位线
curl -X PUT "http://localhost:9200/_cluster/settings" \
-H 'Content-Type: application/json' \
-d '{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "85%",
"cluster.routing.allocation.disk.watermark.high": "90%",
"cluster.routing.allocation.disk.watermark.flood_stage": "95%"
}
}'
# 4. JVM 优化
# -Xms 和 -Xmx 设置相同值,避免运行时动态调整
# 不要超过 30GB(JVM 压缩指针优化阈值)
# 使用 G1GC 垃圾回收器(ES 8.x 默认)ILM 索引生命周期管理
# 创建 ILM 策略
curl -X PUT "http://localhost:9200/_ilm/policy/logs-policy" \
-H 'Content-Type: application/json' \
-d '
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "30gb",
"max_age": "1d",
"max_docs": 1000000
},
"set_priority": { "priority": 100 }
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": { "number_of_shards": 1 },
"forcemerge": { "max_num_segments": 1 },
"set_priority": { "priority": 50 }
}
},
"cold": {
"min_age": "30d",
"actions": {
"set_priority": { "priority": 0 },
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}'
# 将 ILM 策略绑定到索引模板
curl -X PUT "http://localhost:9200/_index_template/logs-template" \
-H 'Content-Type: application/json' \
-d '
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "logs-policy",
"index.lifecycle.rollover_alias": "logs"
},
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"level": { "type": "keyword" },
"message": { "type": "text" },
"service": { "type": "keyword" }
}
}
}
}'
# 查看 ILM 策略状态
curl "http://localhost:9200/_ilm/policy/logs-policy?pretty"
curl "http://localhost:9200/_ilm/explain/logs-*?only_managed=true&pretty"常用监控与诊断命令
# 集群健康
curl "http://localhost:9200/_cluster/health?pretty"
curl "http://localhost:9200/_cluster/health?level=shards&pretty"
# 节点状态
curl "http://localhost:9200/_cat/nodes?v&h=name,ip,heap.percent,ram.percent,disk.used_percent,node.role,master"
curl "http://localhost:9200/_nodes/stats?pretty"
# 索引统计
curl "http://localhost:9200/_cat/indices?v&s=store.size:desc"
curl "http://localhost:9200/_stats/store,docs?pretty"
# 查看挂起任务
curl "http://localhost:9200/_cluster/pending_tasks?pretty"
# 查看慢查询日志(需要在 elasticsearch.yml 中配置)
# index.search.slowlog.threshold.query.warn: 10s
# index.search.slowlog.threshold.query.info: 5s
# index.indexing.slowlog.threshold.index.warn: 10s
# 查看线程池状态
curl "http://localhost:9200/_cat/thread_pool?v&h=name,active,queue,rejected,completed"
# 查看分片分配失败原因
curl "http://localhost:9200/_cluster/allocation/explain?pretty"
# 查看恢复状态(节点重启后)
curl "http://localhost:9200/_cat/recovery?v&active_only=true"优点
缺点
总结
Elasticsearch 真正擅长的是“搜索”和“分析”,而不是替代数据库。落地时最重要的不是安装成功,而是索引结构设计、字段类型选择、查询方式、生命周期管理和资源规划,这些决定了它最终是成为一个高价值搜索平台,还是一个高成本故障源。
关键知识点
text和keyword的选择会直接影响查询和聚合结果。- Mapping 一旦设计错误,后续迁移和重建索引成本会很高。
- ES 适合搜索和分析,不适合高一致性事务处理。
- 分片和副本不是越多越好,要结合数据量和节点规模。
项目落地视角
- 商品搜索、文档搜索、站内检索是最典型业务搜索场景。
- 日志分析平台则更看重字段治理、索引生命周期和成本控制。
- 中文分词、字段 Mapping、查询 DSL 是最常见的三个落地点。
- 中小团队如果只是简单关键词搜索,不一定需要一上来就全套 ELK。
常见误区
- 把关系型数据库里的所有字段原样同步到 ES,不做索引设计。
- Mapping 没规划好,结果 text / keyword 混乱。
- 把 ES 当主数据库,承载强事务场景。
- 集群问题一出现就盲目加机器,不先分析分片和查询结构。
进阶路线
- 深入学习 IK 分词、Analyzer、Tokenizer 和 Search Analyzer。
- 研究 nested、bool query、function score、suggest 和 aggregation 优化。
- 学习 ILM、rollover、冷热分层和索引模板治理。
- 结合 Kibana、Filebeat、Logstash 构建完整日志平台。
适用场景
- 业务全文检索。
- 日志检索与可视化分析。
- 风险分析、运营分析、审计搜索。
- 需要高维过滤和复杂聚合的查询场景。
落地建议
- 建索引前先设计字段用途:搜索、过滤、排序还是聚合。
- 索引命名、Mapping 模板和生命周期策略要先统一。
- 业务搜索和日志搜索尽量分离,不要混在同一索引治理逻辑里。
- 对关键查询建立慢查询和资源监控,不要只盯功能正确。
排错清单
- 搜不到数据时,先检查写入成功、refresh 时机和 Mapping。
- 聚合不对时,先检查字段是不是
keyword。 - 搜索效果差时,先检查 Analyzer 和分词结果。
- 集群变黄/变红时,先看 shard 分配和磁盘水位,而不是盲目重启。
复盘问题
- 你用 ES 的核心目标是搜索、日志,还是分析?
- 当前索引结构是围绕查询设计的,还是围绕数据库字段平移的?
- 哪些字段应该是
text,哪些必须是keyword? - 如果数据量增长 10 倍,现有分片和生命周期策略还能撑住吗?
