MongoDB 索引性能

SunnyFan大约 17 分钟约 5171 字

MongoDB 索引性能

简介

MongoDB 的性能问题，绝大多数最终都能追溯到索引设计是否匹配查询模式。因为 MongoDB 文档结构灵活，开发阶段很容易"先存起来再说"，但一旦查询条件、排序字段、聚合流程和索引设计脱节，慢查询、全表扫描和内存排序就会迅速出现。

索引是 MongoDB 中提升查询性能最核心的手段，没有之一。一条没有索引支持的查询会触发全集合扫描（Collection Scan, COLLSCAN），在百万级文档下可能需要数秒甚至数十秒，而命中索引后通常在毫秒级完成。因此，理解索引的结构、设计原则和管理方法，是 MongoDB 开发者和 DBA 必须掌握的核心技能。

MongoDB 索引的本质

MongoDB 使用 B-Tree 作为索引数据结构（与 MySQL 的 B+ Tree 不同，MongoDB 的 B-Tree 在叶子节点之间没有链表指针）。每个索引条目包含索引键值和对应的文档指针（_id 或聚集索引的文档位置）。

MongoDB 索引结构示意：

复合索引 { customerId: 1, status: 1, createdAt: -1 }

         customerId=1001
        /               \
   status='paid'     status='pending'
   /          \       /          \
createdAt↓  createdAt↓  ...
  2024-01-20  2024-01-15
  → _id: 1001  → _id: 1005

查询匹配规则（从左到右）：
- WHERE customerId=1001 → 命中第一层
- WHERE customerId=1001 AND status='paid' → 命中前两层
- WHERE status='paid' → 不命中（跳过了第一层）
- WHERE customerId=1001 AND createdAt > '2024-01-01' → 命中第一层，createdAt 用索引排序

B-Tree 索引的每个节点存储了索引键值和指向下一层或文档的指针。MongoDB 从根节点开始，逐层比较索引键值，最终定位到目标文档。复合索引按照定义的字段顺序建立排序，查询时也必须从左到右匹配才能充分利用索引。

索引类型全景

MongoDB 支持多种索引类型，每种类型针对不同的查询场景：

索引类型	用途	示例
单字段索引	单条件查询	`{ customerId: 1 }`
复合索引	多条件组合查询	`{ customerId: 1, status: 1 }`
多键索引	数组字段查询	`{ tags: 1 }`
文本索引	全文搜索	`{ title: "text" }`
地理空间索引	位置查询	`{ location: "2dsphere" }`
哈希索引	哈希分片键	`{ userId: "hashed" }`
TTL 索引	自动过期清理	`{ expireAt: 1 }`
部分索引	条件子集索引	`partialFilterExpression`
唯一索引	去重约束	`{ email: 1 }, { unique: true }`

实战示例

单字段索引与复合索引

// 订单集合示例
db.orders.insertMany([
  {
    orderNo: "O1001", customerId: 1001, status: "paid",
    amount: 199, createdAt: ISODate("2024-01-10T10:00:00Z"),
    items: [{productId: "P001", qty: 2}, {productId: "P002", qty: 1}]
  },
  {
    orderNo: "O1002", customerId: 1001, status: "pending",
    amount: 299, createdAt: ISODate("2024-01-11T11:00:00Z"),
    items: [{productId: "P003", qty: 1}]
  },
  {
    orderNo: "O1003", customerId: 1002, status: "paid",
    amount: 99, createdAt: ISODate("2024-01-12T12:00:00Z"),
    items: [{productId: "P001", qty: 1}]
  }
]);

// 单字段索引
db.orders.createIndex({ customerId: 1 });
// 1 表示升序，-1 表示降序
// 对于单字段索引，升序降序对查询性能没有影响

// 复合索引：客户 + 状态 + 时间倒序
db.orders.createIndex({
    customerId: 1,
    status: 1,
    createdAt: -1
});
// 索引内部按 customerId 升序 → status 升序 → createdAt 降序排列

// 命中复合索引的查询
db.orders.find(
    { customerId: 1001, status: "paid" },
    { orderNo: 1, amount: 1, createdAt: 1 }
).sort({ createdAt: -1 }).explain("executionStats");

// explain 关键指标：
// winningPlan.stage: "IXSCAN" → 索引扫描（好）
// winningPlan.stage: "COLLSCAN" → 全表扫描（差）
// totalDocsExamined: 1 → 只检查了 1 个文档（好）
// totalKeysExamined: 1 → 只检查了 1 个索引键（好）
// executionTimeMillis: 0 → 执行时间

复合索引的前缀规则

复合索引的匹配遵循"最左前缀"原则。索引 { A: 1, B: 1, C: 1 } 可以支持以下查询模式：

// 假设索引为 { customerId: 1, status: 1, createdAt: -1 }

// ✅ 命中索引（使用第一列）
db.orders.find({ customerId: 1001 });

// ✅ 命中索引（使用前两列）
db.orders.find({ customerId: 1001, status: "paid" });

// ✅ 命中索引（使用所有列）
db.orders.find({ customerId: 1001, status: "paid", createdAt: { $gt: ISODate("2024-01-01") } });

// ✅ 跳过中间列，但前两列用于索引扫描，第三列用于索引排序
db.orders.find({ customerId: 1001 })
    .sort({ status: 1, createdAt: -1 });

// ❌ 跳过第一列（不命中索引）
db.orders.find({ status: "paid" });

// ❌ 跳过第一列和第二列（不命中索引）
db.orders.find({ createdAt: { $gt: ISODate("2024-01-01") } });

// ⚠️ 部分命中：使用第一列做等值匹配，跳过第二列，用第三列做范围查询
// 索引会用到 customerId=1001，但 status 和 createdAt 需要回表过滤
db.orders.find({ customerId: 1001, createdAt: { $gt: ISODate("2024-01-01") } });

ESR 索引设计原则

ESR（Equality, Sort, Range）是 MongoDB 复合索引设计的核心原则：

ESR 原则：

复合索引顺序：等值匹配(E) → 排序(S) → 范围查询(R)

示例：查询客户 1001 的已支付订单，按时间倒序排列
db.orders.find({
    customerId: 1001,          // E: 等值匹配
    status: "paid",             // E: 等值匹配
    createdAt: { $gte: ISODate("2024-01-01") }  // R: 范围查询
}).sort({ createdAt: -1 });     // S: 排序

索引设计：
{ customerId: 1, status: 1, createdAt: -1 }
// E(1) → E(1) → S+R(-1)
// 排序字段放在范围字段前面

错误设计：
{ customerId: 1, createdAt: -1, status: 1 }
// 等值字段放在了范围字段后面，status 无法使用索引

ESR 原则的底层逻辑是：等值匹配将索引定位到一个精确点，排序在这个点上按顺序遍历，范围查询在排序之后做边界裁剪。如果范围查询在排序之前，索引排序就被打断了，排序必须回内存完成。

// ESR 实战案例 1：用户消息列表
// 查询：获取用户 1001 的已读消息，按时间倒序
db.messages.find({
    userId: 1001,        // E
    isRead: true          // E
}).sort({ createdAt: -1 });  // S

// 最佳索引
db.messages.createIndex({ userId: 1, isRead: 1, createdAt: -1 });

// ESR 实战案例 2：商品搜索
// 查询：搜索电子产品类目下价格在 100-500 之间的商品，按销量排序
db.products.find({
    category: "electronics",           // E
    price: { $gte: 100, $lte: 500 }    // R
}).sort({ salesCount: -1 });           // S

// ⚠️ 注意：这里的排序字段(S)在范围字段(R)之后
// 索引 { category: 1, salesCount: -1, price: 1 } 也行，但 price 只能用于过滤不能用于排序
// 实际场景中如果排序很重要，考虑：
db.products.createIndex({ category: 1, price: 1, salesCount: -1 });
// 这时 price 做范围裁剪，salesCount 做排序（但前提是 price 范围不大的场景）

覆盖查询、TTL 索引与部分索引

// 覆盖查询（Covered Query）：查询字段都在索引里，避免回表
// 索引包含 customerId、status、createdAt、orderNo、amount
db.orders.createIndex(
    { customerId: 1, status: 1, createdAt: -1, orderNo: 1, amount: 1 },
    { name: "idx_order_cover" }
);

// 查询投影只取索引字段
db.orders.find(
    { customerId: 1001, status: "paid" },
    { _id: 0, orderNo: 1, amount: 1, createdAt: 1 }
).sort({ createdAt: -1 }).explain("executionStats");

// 覆盖查询的判断标准：
// totalDocsExamined = 0 → 说明没有回表（所有数据从索引获取）
// 但 MongoDB 的 covered query 要求：查询条件和投影的所有字段都在索引中
// 注意：_id 默认返回，如果不包含在索引中会破坏覆盖查询
// 解决方案：投影中显式排除 _id：{ _id: 0, ... }

// TTL 索引：自动清理过期数据
db.login_tokens.createIndex(
    { expireAt: 1 },
    { expireAfterSeconds: 0 }
);

// 数据示例
db.login_tokens.insertOne({
    userId: 1001,
    token: "abc123def456",
    expireAt: ISODate("2026-04-12T12:00:00Z")
});
// MongoDB 后台线程每 60 秒扫描一次 TTL 索引
// 自动删除 expireAt 早于当前时间的文档

// TTL 索引注意事项：
// 1. expireAfterSeconds 是相对于字段值的时间偏移
// 2. 删除操作是后台异步的，不保证精确到秒
// 3. TTL 索引不能用于复合索引的前缀
// 4. _id 字段不支持 TTL

// 部分索引（Partial Index）：只给常用子集建索引
// 降低索引大小和维护成本
db.orders.createIndex(
    { createdAt: -1 },
    {
        partialFilterExpression: { status: "paid" },
        name: "idx_paid_createdAt"
    }
);
// 只为 status="paid" 的文档建立索引
// 查询条件必须包含 partialFilterExpression 中的条件才能使用该索引
db.orders.find({ status: "paid" }).sort({ createdAt: -1 });
// 命中部分索引

db.orders.find({ status: "pending" }).sort({ createdAt: -1 });
// 不命中部分索引（查询条件不匹配 partialFilterExpression）

// 部分索引的高级用法：结合唯一索引实现条件唯一
// 场景：每个客户只能有一个待支付订单
db.orders.createIndex(
    { customerId: 1 },
    {
        unique: true,
        partialFilterExpression: { status: "pending" },
        name: "idx_one_pending_per_customer"
    }
);
// 只有 status="pending" 的文档会参与唯一约束
// 一个客户可以有多条已支付订单，但只能有一条待支付订单

// 另一个实用场景：只索引活跃用户
db.users.createIndex(
    { email: 1 },
    {
        unique: true,
        partialFilterExpression: { isActive: true },
        name: "idx_active_user_email"
    }
);
// 已注销用户的 email 不参与唯一约束，允许重新注册相同邮箱

// 稀疏索引（Sparse Index）
// 只对包含索引字段的文档建立索引
db.users.createIndex(
    { phone: 1 },
    { sparse: true }
);
// 没有 phone 字段的文档不会出现在索引中
// 查询 phone=null 时不会匹配到没有 phone 字段的文档

// sparse vs partial 的区别：
// sparse：基于字段是否存在
// partial：基于任意条件表达式
// 推荐：优先使用 partial（更灵活）

多键索引（数组索引）

// MongoDB 会自动为数组字段创建多键索引
db.products.insertMany([
    { name: "笔记本电脑", tags: ["电子", "办公", "便携"] },
    { name: "机械键盘", tags: ["电子", "游戏", "办公"] },
    { name: "运动鞋", tags: ["运动", "户外"] }
]);

// 创建索引（自动成为多键索引）
db.products.createIndex({ tags: 1 });

// 查询数组元素
db.products.find({ tags: "电子" });
// 命中索引，返回笔记本电脑和机械键盘

db.products.find({ tags: { $all: ["电子", "办公"] } });
// 命中索引，返回同时包含"电子"和"办公"的商品

// 注意：复合索引中只能有一个字段是多键索引
// 如果两个字段都是数组，MongoDB 无法创建复合索引
db.articles.createIndex({ tags: 1, categories: 1 });
// 如果 tags 和 categories 都是数组，会报错

// 数组嵌套文档的索引
db.stores.insertOne({
    name: "旗舰店",
    locations: [
        { city: "北京", district: "朝阳" },
        { city: "上海", district: "浦东" }
    ]
});
db.stores.createIndex({ "locations.city": 1 });
db.stores.find({ "locations.city": "北京" });

explain 分析与聚合优化

// explain 分析的关键步骤
// 1. 查看 winningPlan 确认是否命中索引
// 2. 查看 totalDocsExamined 与返回文档数的比例
// 3. 查看 executionTimeMillis 确认执行时间

// explain 的三种模式：
// db.orders.find(...).explain("queryPlanner");    // 只看计划（默认）
// db.orders.find(...).explain("executionStats");  // 看执行统计
// db.orders.find(...).explain("allPlansExecution"); // 看所有候选计划

// 慢查询分析
db.orders.find({ status: "paid" }).sort({ createdAt: -1 }).explain("executionStats");
// 如果 winningPlan.stage = "COLLSCAN" → 需要加索引
// 如果 totalDocsExamined 远大于返回文档数 → 索引不够精确

// 聚合管道优化原则：
// 1. $match 尽早：让索引尽早发挥作用
// 2. $project 精简：减少后续阶段的数据量
// 3. $sort 利用索引：排序字段应在索引中
// 4. $limit 尽早：减少后续处理的数据量

db.orders.aggregate([
    { $match: { status: "paid", createdAt: { $gte: ISODate("2024-01-01") } } },
    { $project: { customerId: 1, amount: 1, createdAt: 1 } },
    { $sort: { createdAt: -1 } },
    { $limit: 100 }
]).explain("executionStats");

// 常见聚合管道性能问题
// 错误：先 $group 后 $match
db.orders.aggregate([
    { $group: { _id: "$customerId", total: { $sum: "$amount" } } },
    { $match: { total: { $gt: 10000 } } }
]);
// 所有文档都参与 $group，然后才过滤

// 正确：先 $match 后 $group
db.orders.aggregate([
    { $match: { status: "paid" } },
    { $group: { _id: "$customerId", total: { $sum: "$amount" } } },
    { $match: { total: { $gt: 10000 } } }
]);
// $match 先过滤，$group 只处理匹配的文档

聚合管道索引优化详解

// 聚合管道中 $match 阶段可以使用索引
// 前提：$match 必须是管道的第一个阶段
db.orders.aggregate([
    { $match: { customerId: 1001, createdAt: { $gte: ISODate("2024-01-01") } } },
    { $group: { _id: "$status", total: { $sum: "$amount" }, count: { $sum: 1 } } }
]).explain("executionStats");
// 如果索引 { customerId: 1, createdAt: -1 } 存在，$match 阶段会使用 IXSCAN

// $sort 阶段可以利用索引（当 $match 和 $sort 字段与索引一致时）
db.orders.aggregate([
    { $match: { customerId: 1001 } },
    { $sort: { createdAt: -1 } },
    { $limit: 20 }
]);
// 如果索引 { customerId: 1, createdAt: -1 } 存在，$sort 可以利用索引排序
// 此时不会发生内存排序（in-memory sort）

// $lookup 优化：对被查询集合建立索引
db.orders.aggregate([
    { $match: { status: "paid" } },
    { $lookup: {
        from: "users",
        localField: "customerId",
        foreignField: "_id",
        as: "customer"
    }}
]);
// 确保 users 集合的 _id 字段有索引（默认有）
// 如果 foreignField 不是 _id，需要手动建索引

// $unwind + $group 的常见优化
// 如果数组元素不多，先 $project 提取数组长度再做 $match
db.orders.aggregate([
    { $project: { orderNo: 1, itemCount: { $size: "$items" } } },
    { $match: { itemCount: { $gte: 3 } } }
]);
// 避免 $unwind 产生大量中间文档

索引管理

// 查看集合所有索引
db.orders.getIndexes();

// 查看索引使用情况（MongoDB 4.2+）
db.orders.aggregate([
    { $indexStats: {} }
]);
// 返回每个索引的使用次数和操作类型

// 删除索引
db.orders.dropIndex("idx_order_cover");

// 删除所有索引（除 _id 外）
db.orders.dropIndexes();

// 重建索引（修复碎片化）
db.orders.reIndex();
// 注意：reIndex 会锁住集合，生产环境应在维护窗口执行

// 创建唯一索引
db.orders.createIndex({ orderNo: 1 }, { unique: true });

// 创建不区分大小写的索引（MongoDB 3.4+）
db.users.createIndex(
    { email: 1 },
    { collation: { locale: "en", strength: 2 } }
);
// strength=2 表示不区分大小写和重音
// 查询时需要指定相同的 collation
db.users.find({ email: "USER@EXAMPLE.COM" }).collation({ locale: "en", strength: 2 });

索引构建策略

// 后台建索引（MongoDB 4.2+ 默认所有索引构建都是后台的）
// 旧版本需要显式指定 background: true
db.orders.createIndex(
    { customerId: 1, status: 1 },
    { background: true, name: "idx_customer_status_bg" }
);

// 查看索引大小
db.orders.totalIndexSize();        // 总索引大小（字节）
db.orders.totalIndexSize(1024 * 1024); // 转换为 MB

// 查看特定索引大小
db.orders.aggregate([
    { $indexStats: {} }
]);
// 返回中包含 bytes 属性

// 索引开销分析
// 每个索引都会增加写操作的开销
// 评估公式：写入放大 = (1 + 索引数量) × 单次写入
// 例：集合有 5 个索引，每次写入实际产生 6 次 IO（1 次数据 + 5 次索引）

// 索引数量建议：
// - 读多写少场景：索引可以多一些（5-10 个）
// - 写多读少场景：索引尽量精简（2-3 个）
// - 读写均衡场景：3-5 个索引比较合理

文本索引与地理空间索引

// 文本索引（Text Index）
db.articles.createIndex({ title: "text", content: "text" });
db.articles.find({ $text: { $search: "MongoDB 索引" } });
// 支持中文需要额外的语言分析器配置
// 一个集合最多只能有一个文本索引

// 文本索引权重设置
db.articles.createIndex(
    { title: "text", content: "text", tags: "text" },
    { weights: { title: 10, content: 5, tags: 3 }, name: "idx_text_weighted" }
);
// title 匹配的得分权重是 content 的 2 倍

// 文本搜索的进阶用法
db.articles.find(
    { $text: { $search: "MongoDB 性能 优化" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } });
// 返回相关度得分并按得分排序

// 排除词搜索
db.articles.find({ $text: { $search: "MongoDB -安装" } });
// 搜索包含"MongoDB"但不包含"安装"的文章

// 地理空间索引（Geospatial Index）
db.stores.createIndex({ location: "2dsphere" });

// 附近搜索
db.stores.find({
    location: {
        $near: {
            $geometry: { type: "Point", coordinates: [116.4, 39.9] },
            $maxDistance: 5000  // 5 公里内
        }
    }
});

// 矩形范围内搜索
db.stores.find({
    location: {
        $geoWithin: {
            $box: [
                [116.3, 39.8],  // 左下角
                [116.5, 40.0]   // 右上角
            ]
        }
    }
});

// 多边形范围内搜索
db.stores.find({
    location: {
        $geoWithin: {
            $geometry: {
                type: "Polygon",
                coordinates: [[
                    [116.3, 39.8], [116.5, 39.8],
                    [116.5, 40.0], [116.3, 40.0],
                    [116.3, 39.8]
                ]]
            }
        }
    }
});

慢查询诊断与索引优化流程

开启慢查询日志

// 查看当前慢查询阈值（毫秒）
db.getProfilingStatus();

// 设置慢查询阈值为 100ms
db.setProfilingLevel(1, 100);
// 0 = 关闭，1 = 记录慢查询，2 = 记录所有查询

// 查看慢查询
db.system.profile.find().sort({ ts: -1 }).limit(10);

// 查看最近 5 条慢查询的详细信息
db.system.profile.find({
    millis: { $gt: 100 }
}).sort({ ts: -1 }).limit(5).pretty();

索引优化决策流程

1. 发现慢查询
   ↓
2. explain("executionStats") 分析
   ↓
3. 判断 winningPlan.stage
   ├── COLLSCAN → 需要建索引 → 根据查询条件设计复合索引
   ├── IXSCAN → 检查扫描效率
   │   ├── totalDocsExamined ≈ 返回文档数 → 索引良好
   │   ├── totalDocsExamined >> 返回文档数 → 索引不够精确
   │   │   └── 增加 WHERE 条件字段到索引前缀
   │   └── totalKeysExamined >> totalDocsExamined → 索引选择性差
   └── SORT → 检查排序是否在内存中完成
       └── 添加排序字段到索引（ESR 原则）

索引选择性分析

// 查看字段的选择性（不同值的数量 / 总文档数）
// 选择性越接近 1，索引效果越好

db.orders.distinct("status").length;  // 3 种状态
db.orders.countDocuments();           // 总文档数

// status 字段选择性 = 3 / 总数（低，不适合单独建索引）
// customerId 字段选择性 = 客户数 / 总数（通常较高，适合建索引）

// 使用 $collStats 查看集合统计信息
db.orders.aggregate([{ $collStats: { storageStats: {} } }]);

// 使用 $sample 估算选择性
db.orders.aggregate([
    { $sample: { size: 10000 } },
    { $group: { _id: "$status", count: { $sum: 1 } } }
]);

优点

1.查询提速明显 — 合适索引能显著减少扫描文档数，百万级数据从秒级降到毫秒级
2.支持场景丰富 — 文本、TTL、部分、复合索引都很实用，覆盖绝大多数查询需求
3.与灵活 Schema 兼容 — 可逐步围绕查询演进索引设计，不需要提前确定所有字段
4.排查手段直接 — explain 能快速看出索引是否命中，诊断效率高

缺点

1.写入成本上升 — 每增加一个索引都会增加写放大，每次 INSERT/UPDATE/DELETE 都需要同步更新索引
2.设计依赖查询模式 — 查询变化后旧索引可能迅速失效，需要持续维护
3.组合容易失控 — 索引建太多会拖慢更新并增加存储成本，一个集合通常不超过 5-10 个索引
4.文档灵活性有代价 — 字段不统一时更容易出现索引混乱，需要注意多键索引的限制

总结

MongoDB 索引优化的核心不是"多建索引"，而是让索引真正服务高频查询和排序。实践中应优先从慢查询日志和 explain 结果出发，围绕查询模式构建少而精的复合索引，而不是把每个字段都单独建一遍。ESR 原则是复合索引设计的黄金法则，覆盖查询是减少回表的有效手段，部分索引和 TTL 索引是控制索引大小和自动清理的利器。