MongoDB pitfall checklist

Query Issues

Concerns while interacting with MongoDB:

Queries are case sensitive

Either create data in a known-case or use proper case on queries.

Mongo doesn't enforce data types

Make sure to use the correct data type (string, int, etc).

Updates are automatically limited to one document

Set the 'multi' option to true to update multiple documents.

db.collection.update(  
   <query>,
   <update>,
   {
     upsert: <boolean>,
     multi: <boolean>,
     writeConcern: <document>
   }
)

Example of updating 2 records simultaneously:

> db.customers.update({age: {$gte: 21}}, {$set: {can_buy_alcohol: true}}, false, true)

Result:

> WriteResult({ "nMatched" : 2, "nUpserted" : 0, "nModified" : 2 })
Zero Joins

If more than one collection needs accessed, more than one query is needed. Redesign schemas to address this.

Locking

Be aware that Mongo uses collection-level blocking.

Transactions

Mongo only supports single document atomicity.

Scaling Issues

Concerns while scaling MongoDB:

Mongo will not shard a collection over 256G

Shard before reaching this threshold.

Sharding data

Sharding is done to increase performance when a replica set is too slow. Shard early... preferably before reaching 80% of capacity. Stopping and resizing machines is MUCH quicker than migrating thousands of chunks of data.

Shard keys can't be updated in a document

The only solution is to remove the document and re-insert it.

Unique indexes and sharding

If you need to ensure that a field is always unique in a sharded collection, there are three options:

  • Enforce uniqueness of the shard key. MongoDB can enforce uniqueness for the shard key. For compound shard keys, MongoDB will enforce uniqueness on the entire key combination, and not for a specific component of the shard key. You cannot specify a unique constraint on a hashed index.

  • Use a secondary collection to enforce uniqueness. Create a minimal collection that only contains the unique field and a reference to a document in the main collection. If you always insert into a secondary collection before inserting to the main collection, MongoDB will produce an error if you attempt to use a duplicate key.If you have a small data set, you may not need to shard this collection and you can create multiple unique indexes. Otherwise you can shard on a single unique key.

  • Use guaranteed unique identifiers. Universally unique identifiers (i.e. UUID) like the ObjectId are guaranteed to be unique.

Choose the correct shard key

A few topics to review before picking a shard key:

  • Immutable Shard keys
  • Hashed Shard Keys
  • Impacts of Shard Keys on Cluster Operations, including Write Scaling, Querying and Invisible chunks

Systematic Issues

Network, server, data integrity concerns:

Available RAM

Make sure working sets fit into memory.

Process limits in Linux

Permanently increase hard and soft limits over 4k for open file / user processes for Mongo on Linux if experiencing segfaults under load.

Document Sizes

Keep documents under 16M each. If you need to go above 16M, look at GridFS

No Authentication by default

Secure it using a firewall and binding it to the correct interface, or enable authentication.

Traffic to and from Mongo is unencrypted

If it needs to be run over a public connection, compile a version of Mongo with SSL support enabled.

Disable NUMA

NUMA doesn't play well with Mongo (or MySQL, or probably any other database either).

When using Replica Sets, use an odd number

If one node fails, the rest of the set will go read-only. The remaining nodes won't have enough votes to get to a quorum. A better route is arbiters, which vote as normal but doesn't store any user data so it lowers server size.

Data Integrity Issues

Concerns about data integrity using MongoDB:

Data loss with Replica Sets on failure

On a node failure or failover, data that gets 'missed' will be in the rollover directory. Always check it after a failure scenario.

Write failure

Use safe writes or use getLastError to confirm writes.

Journaling

Journaling reduces flushes to disk from 60 seconds to 100ms. Do not disable journaling.

Journal allocation times

Use the flag -nopreallocj to disable pre-allocation if your disks or file system is super slow.