优化MongoDB索引

要在MongoDB上使应用程序运行性能良好，好的索引必不可少。当它将你的索引放在RAM中时，将能使它达到最好的性能。减少索引的大小亦有助于得到更快的查询速度，并通过更小的内存管理更多的数据。以下是一些用来减小MongoDB索引大小的技巧：

检查索引的大小

首先你应该做的是去了解你的索引的大小。在你做出一些改变并检查这种改变是否能减少索引大小之前，你会想先知道索引目前的大小。理想状态下，你一直在使用着你的监测工具图形化监测索引。使用Mongo shell时，我们可以通过运行db.stats()命令来得到索引统计数据 :

> db.stats(){ "db" : "examples1", "collections" : 6, "objects" : 403787, "avgObjSize" : 121.9966467469235, "dataSize" : 49260660, "storageSize" : 66695168, "numExtents" : 20, "indexes" : 9, "indexSize" : 48524560, "fileSize" : 520093696, "nsSizeMB" : 16, "ok" : 1 }

Indexes : 在 examples1 数据库中的索引数目；
indexSize : 在 examples1 数据库中索引的大小

因为每个数据集合( collection )都拥有索引，所以你也可以通过执行 db.collection.stats( ) 来检查它们：

> db.address.stats(){ "ns" : "examples1.address", "count" : 3, "size" : 276, "avgObjSize" : 92, "storageSize" : 8192, "numExtents" : 1, "nindexes" : 2, "lastExtentSize" : 8192, "paddingFactor" : 1, "flags" : 1, "totalIndexSize" : 16352, "indexSizes" : { "_id_" : 8176, "_types_1" : 8176 }, "ok" : 1 }

totalIndexSize - 在数据集合( collection )所有索引的大小；
indexSizes - 由索引名称与大小组成的字典( dictionary )

注意 : 这里所有由执行命令返回的结果都是以bytes为单位。这些命令都很有用但它们手工使用起来很乏味。我写了一个工具index-stats.py来生成索引统计数据的报告，让事情变得更简单。你可以在Github上的mongodb-tools 项目中找到它。

(virtualenv) mongodb-tools$ ./index-stats.pyChecking DB: examples2.system.indexes Checking DB: examples2.things Checking DB: examples1.system.indexes Checking DB: examples1.address Checking DB: examples1.typeless_address Checking DB: examples1.user Checking DB: examples1.typeless_user Index Overview +----------------------------+--------------------------------+---------+-------------+ | Collection | Index | % Size | Index Size | +----------------------------+--------------------------------+----------+------------+ | examples1.address | _id_ | 0.0% | 7.98K | | examples1.address | _types_1 | 0.0% | 7.98K | | examples1.typeless_address | _id_ | 0.0% | 7.98K | | examples1.typeless_user | _id_ | 10.1% | 6.21M | | examples1.typeless_user | address_id_1 | 10.1% | 6.21M | | examples1.typeless_user | typeless_address_ref_1 | 5.9% | 3.62M | | examples1.user | _id_ | 10.1% | 6.21M | | examples1.user | _types_1 | 6.9% | 4.24M | | examples1.user | _types_1_address_id_1 | 12.2% | 7.51M | | examples1.user | _types_1_address_ref_1 | 26.2% | 16.09M | | examples2.things | _id_ | 10.1% | 6.21M | | examples2.things | _types_1 | 8.4% | 5.13M | +----------------------------+--------------------------------+----------+------------+ Top 5 Largest Indexes +----------------------------+--------------------------------+----------+------------+ | Collection | Index | % Size | Index Size | +----------------------------+--------------------------------+----------+------------+ | examples1.user | _types_1_address_ref_1 | 26.2% | 16.09M | | examples1.user | _types_1_address_id_ 1 | 12.2% | 7.51M | | examples1.typeless_user | _id_ | 10.1% | 6.21M | | examples2.things | _types_1 | 8.4% | 5.13M | | examples1.user | _types_1 | 6.9% | 4.24M | +----------------------------+--------------------------------+----------+------------+ Total Documents: 600016 Total Data Size: 74.77M Total Index Size: 61.43M RAM Headroom: 2.84G Available RAM Headroom: 1.04G

输出的结果展示了总索引大小、每个索引的大小、以及它们的相对大小。此外，报告还指出了在你的所有数据集合( collection )中最大的五个索引。这让检测最大索引、找出能为减少整体大小提供最大贡献的那一个索引变得简便起来。

RAM Headroom是你的物理内存--索引大小。一个看起来不错的值意味着你有可用的RAM给索引来装入内存。
Available RAM Headroom是空余内存--索引大小。因为这个系统上还有其他进程在消耗内存，所以我没有可用的总RAM Headroom。

(virtualenv)mongodb-tools$ ./redundant-indexes.pyChecking DB: examples2 Checking DB: examples1 Index examples1.user[_types_1] may be redundant with examples1.user[_types_1_address_ref_1] Index examples1.user[_types_1] may be redundant with examples1.user[_types_1_address_id_1] Checking DB: local

3 ）执行Compact 命令 如果你正在使用MongoDB 2.0+的版本，你可以执行compact 命令来整理collections和重建索引。执行compact 命令会锁住数据库，所以请在事先确认你清楚地知道你是在什么地方执行这个操作。如果你在Replica sets中执行，那么最简单的事情就是在你的secondaries中执行，每次一个，备份主要的部分到新的secondary中去并在老的primary中执行Compact操作。 4 )MongoDB 2.0 索引改进 如果你还在使用MongoDB 2.0或者更新版本，升级并重建你的索引将会提供大约25%的空间节省。请看 Index Performance Enhancements 5 )检查索引规则 另一件事便是检查你的索引规则。你想要被索引的值小并且提高易查询性( selective)。索引值并不能帮助MongoDB发现你的数据在更快地降低查询速度并增加索引大小。如果你的应用程序正在使用Mapping框架，并且它支持在代码中定义索引，你应该检查看看它到底是如何创建索引的。比如Pyhthon中的MongoEngine使用”_types”来鉴别在同一个数据集合（collection）中的子类。这可能导致索引占用很大的空间并且可能并不增加索引的可查询性（selectivity）。在我的测试数据中，我最大的索引是： | examples1.user | _types_1_address_ref_1 | 26.2% 查看它的数据：

> db.user.findOne(){ "_id" : ObjectId("4f2ef95c89a40a11c5000002"), "_types" : [ "User" ], "address_id" : ObjectId("4f2ef95c89a40a11c5000000"), "address_ref" : { "$ref" : "address", "$id" : ObjectId("4f2ef95c89a40a11c5000000") }, "_cls" : "User" }

你可以看到_types是一个带有类名User值的数组。因为我的代码中没有任何关于User的子类，所以索引这个值将不会对索引的可查询性（selectivity）有任何帮助。另一方面是想想每个相关索引的值都将以”User”作为前缀，这将为导致值增加一些额外的字节并且对索引的可查询性（selectivity）无任何帮助用以下的代码来删除掉它：

class User(Document):meta {'index_types':False}

索引修改为： | examples1.user | address_ref_1 | 16.8% | 节约了23%的存储空间。继续深入挖掘， address_ref_1 是一个Address对象的ReferenceProperty 。以上的代码展示了它是一个包含了参考文件和数据集合（collection）所指向的id的字典。如果我们将这个address_id的ReferenceProperty 改成 ObjectIdProperty，你将可以得到额外的空间节省：

| examples1.user | address_id_1 | 9.5% | 6.21M || examples1.user | address_ref_1 | 20.9% |

节约了53%。这是因为将索引的值从序列化的字典改为更能被MongoDB高度优化的ObjectId。虽然改变属性的类型的确要求代码的修改，并且你同时会失去由ReferenceProperty 提供的自动de-referencing的功能。但它可以节约大量内存。总而言之，我们通过调整一些索引的规则降低了61%的存储并改变了一小段代码。 6 )删除/转移旧数据 在很多应用程序中，一些数据被频繁的访问。如果你有不被你的用户访问的旧数据，那么把它转移到另一个无索引的数据集合（collection）中，或者把它存储在数据库外的某个地方。理想状态下，你的数据库包含并索引可用数据中的工作集。还有一些其他好的优化方式，你可以从以下找到它们：

- MongoDB Performance Tuning
- Optimizing MongoDB: Lessons Learned at Localytics

你如何优化你的索引呢？