Complex batch updates or migration on your distant MongoDB base with a script and mongo shell

in MongoDB

If you work with a hosted MongoDB database on a DAAS (Database As A Service) like MongoDB Atlas, mLab or Compose, you often need scripting for complex operations. Like when you need to update one field in all the documents of a collection with a transformation from other fields in this document or even from different documents in other collections. A simple update is not enough.

There are powerful query and update commands in MongoDB that came with the new aggregation pipeline. They are super efficient but sometimes hard to process. Try to use them when you need performance. Looping on all your document is not efficient, and this is not the NoSQL way of doing things. We are talking here about the times when the MongoDB way fails, and you need something of tracks.

Have your Mongo shell ready

Requirements: have the mongo command line installed

All the details and troubleshooting are available in the MongoDB documentation here. For the lazy ones here is the TL;DR:

You can now use mongo in your terminal.

Connect to your base

To connect to your base, you can use the mongo command.

mongo server_url:server_port/collection_name -u **username** -p **password**

Check your DAAS provider for more details. Here are the links for MongoDB Atlas, mLab and Compose.

You have your prompt ready, and you can start typing commands.

The language

MongoDB script is essentially vanilla JavaScript with a few twists.

How to log

When experimenting and building your script, you need to log a lot. You don’t have a console, but you can print strings. So go to your prompt and type print( 'Hello world' ).

Like in node, f you try to print an object like {a: 1}, you will get what you have if you String({a: 1}): [object Object]. To print an object, don‘t forget to JSON.stringify( {a: 1} ) your object.

This is JavaScript

You can store cursors, arrays, strings, numbers… In variables and manipulate them as you would do in node or in the browser. Try to paste this in your prompt:

var greetings = 'hello world!';
greetings = greetings.toUpperCase();
print( greetings );
var letters = greetings.split( '' ).reverse();
letters.forEach(
    function( letter ) { print( letter ); }
);

Everything is asynchronous!

Asynchronous execution is a crucial point to understand when you write scripts for the mongo shell. When you call mongo commands like collection.update( ... ) or a collection.find( ... ), the shell waits for the command to return before doing anything else.

The sequential nature of your scripts can make them super long to execute on a remote database, but it is critical for security and data integrity.

Execute scripts

The easiest way to execute a script is to save it into .js files and send them to the mongo command.

mongo server_url:server_port/collection_name -u **username** -p **password** < myScript.js

This will open your prompt and start executing your script.

Example of things you can do

Let’s say that you have a products collection in your myStore database and you want to migrate the pricing data. You stored the prices as strings like this one: price: "$123,00" and you want to clean it and store them as numbers. This migration is super important because some of the prices were malformed and you have things like this one $ 42.0 or that one 1337.

Pro tip: don’t do this in one step

Especially if you are doing this on your production database (don’t do this), you want to backup your data. So, start with a script to back them up.

// The reference to your collection
// Not that the db object is exposed by default
// this is the database you are connected to
var products = db.getCollection('Products');

// The cursor
var cursor = products.find({ price: { $exists: 1 } });

// How many products do you have
var count = cursor.count();
var index = 1;

// Loop on all the products in the cursor
cursor.forEach(function(product) {

  // This line is super important for debugging.
  // Each time you will execute an update, print where
  // you are in your list.
  print(index, '/', count, ' : ', product._id);
  index++;

  // A simple update to backup the price data
  products.update(
    {
      _id: product._id,
    },
    {
      $set: {
        oldPrice: product.price,
      },
    },
  );
});

This script can be long because you execute one call to the database for each element that you want to update. There are ways to make this in one request. However, this allows you to do way more complicated updates.

Start the actual update

You can now start the transform you wanted to have.

var products = db.getCollection('Products');
var cursor = products.find({ price: { $exists: 1 } });
var count = cursor.count();
var index = 1;

cursor.forEach(function(product) {
    print(index, '/', count, ' : ', product.\_id);
    index++;

    // Transform the price
    var newPrice = parseFloat(
        product.oldPrice
        .replace('$', '')
        .replace(',', '.')
        .trim()
    );

    products.update(
        {
            \_id: product.\_id,
        },
        {
            $set: {
                price: newPrice,
            },
        },
    );
});

Clean the data

You want to remove the oldPrice key. You can use a loop again to update each document. However, this would be absurd. Make a straightforward call to update all the documents at once.

db.getCollection('Products').update(
  {
    oldPrice: { $exists: 1 }
  },
  {
    $unset: {
      oldPrice: null
    }
  },
  {
    "multi" : true,
  }
);