Introduction to MongoDB

Tanja Adžić
11 min readFeb 27, 2024

--

Introduction

MongoDB is one of the leading document-oriented NoSQL databases. It offers streamlined accessibility through indexing and accommodates diverse data types such as dates and numbers. As a document database, MongoDB makes it easy for developers to store structured or unstructured data. It uses a JSON-like format to store documents. Its power lies in the users ability to query, manipulate, and find interesting insights from their collected data. It can also be used for a variety of purposes because of the flexibility of storing structured or unstructured data. Let’s see what MongoDB is all about.

Overview of MongoDB

MongoDB, a NoSQL database, stores data in documents rather than tables in a non-relational way. Each document, akin to JSON objects or Python dictionaries, represents a record and contains fields (in an imaginary campus management system) like firstName, lastName, email, and studentId. These documents are then grouped into collections, such as Students and Employees.

document = {
"firstName": "John",
"lastName": "Doe",
"email": "john.doe@example.com",
"studentId": "123456789"
}

In MongoDB, data types like dates,numbers, lists are supported, allowing for flexible data storage. Sub-documents and lists of values can also be stored, facilitating organization of secondary information. Unlike relational databases, MongoDB doesn’t require a predefined schema, enabling changes to be made on the fly.

MongoDB ensures high availability by maintaining multiple copies of data. It accommodates structured and unstructured data seamlessly, simplifying complex data structures. Additionally, MongoDB offers scalability, allowing for vertical scaling with improved hardware or horizontal scaling through data partitioning.

Whether deployed on-premises or via managed services like IBM Cloud Databases for MongoDB or MongoDB Atlas on AWS, Azure, and Google Cloud, MongoDB provides robust solutions for diverse data management needs.

Advantages of MongoDB

Flexibility with Schema

The first major benefit of using MongoDB is the flexibility with schema. For example, if we have two dictionaries, one with the value of postcode, and the other with zipcode:

address_with_postcode = {
"street": "Main Street",
"city": "New York",
"postcode": "POST1234"
}

address_with_zipcode = {
"street": "Oak Avenue",
"city": "Los Angeles",
"zipcode": "90210"
}

In a relational database, these fieldnames must be present in each row which translates to having either overarching fields or a lot of fields with no values. But storing in this format in MongoDB is not a problem because it allows us this flexibility with schema. This also allows us to store unstructured data: combining data of different shapes from different sources for analysis or storage.

Code-first Approach

The second benefit is the Code-first Approach. In relational databases we are pushed to design our database first, and only then we are able to work on our database. On the other hand, MongoDB works with documents which means we can access our data without having any complexity in between. There are no complex table definitions, and we can start writing our data as soon as we connect to MongoDB. One additinal benefit is that this mechanism removes any third party frameworks that give us read/write operations.

Evolving Schema

The third key benefit is the Evolving Schema. It refers to the flexibility of managing data structures. Unlike traditional relational databases where a fixed schema must be defined before data insertion, MongoDB allows schemas to evolve over time. This means that new fields or data structures can be added to existing documents without the need to alter the entire schema or perform complex migrations.

Querying and Analytics

Mongo Querying Language (MQL) s a specialized language used for querying MongoDB databases. MQL syntax is designed to interact with MongoDB’s document-oriented data model, allowing users to perform various operations such as querying, updating, inserting, and deleting documents. MQL queries are written in a JSON-like format, making them easy to read and write, especially for developers familiar with JavaScript and JSON syntax. MQL supports a wide range of query operators and aggregation functions, enabling users to perform complex data manipulations and analysis tasks directly within the database.

High Availability

MongoDB is natively a highly available system.

  1. Replica Sets: MongoDB uses replica sets to maintain multiple copies of data across different servers. A replica set consists of multiple MongoDB instances (nodes) where each node replicates data from a primary node. If the primary node fails, one of the secondary nodes is automatically elected as the new primary, ensuring continuous availability of data.
  2. Automatic Failover: In a replica set configuration, MongoDB provides automatic failover. If the primary node becomes unavailable, the replica set automatically detects the failure and promotes one of the secondary nodes to be the new primary. This process ensures that the database remains available even in the event of hardware failures or network issues.
  3. Write Concerns: MongoDB allows users to specify the level of acknowledgment required for write operations using write concerns. By configuring appropriate write concerns, users can ensure that data is safely replicated across multiple nodes before an operation is considered successful, thereby increasing data availability and durability.
  4. Data Center Awareness: MongoDB supports deployment across multiple data centers, allowing users to distribute replica set members across different geographical locations. This setup enhances fault tolerance and disaster recovery capabilities by reducing the impact of network failures or data center outages on overall system availability.
  5. Monitoring and Management: MongoDB provides built-in monitoring and management tools, such as MongoDB Management Service (MMS) and Ops Manager, to help administrators monitor the health and performance of MongoDB deployments. These tools enable proactive detection and resolution of issues, minimizing downtime and ensuring continuous availability of data.

MongoDB Use Cases

Here are some of the most common use cases for MongoDB:

Many Sources-One View Use Case

This use case involves integrating data from multiple sources into a single view for analysis or reporting purposes. MongoDB’s flexible data model and ability to handle diverse data types make it suitable for consolidating data from various sources, such as databases, APIs, and external systems, into a unified view.

IoT Use Case

MongoDB is well-suited for IoT (Internet of Things) applications due to its ability to store and process large volumes of sensor data in real-time. It can handle the high velocity and variety of data generated by IoT devices, enabling organizations to collect, analyze, and act on data from sensors, devices, and machines efficiently.

E-commerce Use Case

MongoDB is widely used in e-commerce applications for product catalog management, order processing, and personalized recommendations. Its flexible schema allows e-commerce platforms to adapt to changing product attributes and customer preferences easily. Additionally, MongoDB’s scalability and high availability ensure reliable performance during peak traffic periods.

Real-time Analytics Use Case

MongoDB is commonly used for real-time analytics applications, where fast and efficient data processing is essential. It enables organizations to perform real-time aggregation, analysis, and visualization of data streams, allowing for timely insights and decision-making. MongoDB’s support for aggregation pipelines and sharding enhances its capabilities in handling large volumes of data for analytics.

Gaming Use Case

MongoDB is a popular choice for gaming applications that require high performance, scalability, and flexibility. It can store player profiles, game states, and other game-related data efficiently, supporting features such as leaderboards, player rankings, and in-game transactions. MongoDB’s ability to handle dynamic schemas and scale horizontally makes it suitable for multiplayer games with large user bases.

Finance Use Case

MongoDB is utilized in finance applications for various purposes, including customer relationship management (CRM), fraud detection, risk management, and compliance reporting. Its ability to handle complex data structures and perform real-time analytics makes it valuable for processing financial transactions, analyzing market data, and managing portfolios. MongoDB’s security features, such as encryption and access controls, also meet the stringent security requirements of the finance industry.

CRUD Operations

The Mongo shell, a command-line tool provided by MongoDB, offers an interactive JavaScript interface for managing databases effortlessly.

We start by connecting to the cluster by providing a connection string. Once connected, users gain access to the database cluster and can proceed with operations.

mongo "connection_string"

To focus on a specific database, such as the campus management database, the command use campusManagementDB is used. This enables users to direct their operations within the designated database.

use campusManagementDB

Within the campus management database, users can explore available collections using the command show collections. For instance, collections like "Staff" and "Students" provide insights into the database's organizational structure.

show collections

CRUD operations, short for Create, Read, Update, and Delete, are important for MongoDB’s functionality. Let’s begin with a Create operation, where a new document is inserted into the “Students” collection.

db.students.insertOne({ "firstName": "John", "lastName": "Doe", "email": "john.doe@example.com", "studentId": 12345 })

MongoDB automatically generates a unique _id for each document if not provided explicitly.

The Mongo shell supports JavaScript interpretation, facilitating variable definitions and other functions. For instance, a variable students_list containing multiple documents can be created and passed as an argument to the insertMany function.

var students_list = [
{ "firstName": "Alice", "lastName": "Smith", "email": "alice.smith@example.com", "studentId": 54321 },
{ "firstName": "Bob", "lastName": "Johnson", "email": "bob.johnson@example.com", "studentId": 67890 }
];
db.students.insertMany(students_list)

Moving on to Read operations, the findOne function retrieves the first document, while find allows for filtering based on specific criteria.

db.students.findOne()
db.students.find({ "email": "john.doe@example.com" })
db.students.find({ "lastName": "Doe" }).count()

In a Replace operation, a student’s record is retrieved and updated, followed by calling the replaceOne function with filter criteria and the updated document.

db.students.replaceOne({ "lastName": "Doe" }, { "lastName": "Doe", "status": "Online" })

Alternatively, Update operations can be executed using the updateOne function.

db.students.updateOne({ "lastName": "Doe" }, { $set: { "status": "Online" } })

For bulk updates, the updateMany function is employed without filter criteria, along with a change document.

db.students.updateMany({}, { $set: { "status": "Online" } })

To perform Delete operations, the deleteOne function is used with filter criteria to remove specific documents.

db.students.deleteOne({ "status": "Online" })

Similarly, the deleteMany function is available for deleting multiple documents based on specified criteria.

db.students.deleteMany({ "status": "Online" })

Indexing

Indexing in MongoDB is a way to optimize database searches, making them faster and more efficient. It’s like creating an index for a book that lists all the important topics along with the page numbers where they can be found.

Indexes in MongoDB, similar to indexes in other database systems, are data structures that store a small subset of the collection’s data in an easily traversable format. They allow MongoDB to quickly locate documents based on the indexed fields.

By creating indexes on fields commonly used in queries, MongoDB can significantly accelerate query execution. When a query specifies criteria that match the indexed fields, MongoDB utilizes the index to locate the relevant documents, reducing the number of documents scanned and improving query response times.

For example, let’s consider our “Students” collection in the campus management database. Suppose we frequently search for students based on their “studentId”. By creating an index on the studentId field, MongoDB builds a sorted list of all student IDs. So, when we search for a specific student ID, MongoDB can quickly locate the corresponding document without scanning through every document in the collection.

Here’s how we can create an index on the studentId field:

db.Students.createIndex({ studentId: 1 })

This command tells MongoDB to create an index on the “studentId” field in the “Students” collection, sorting the IDs in ascending order (1). Now, searches based on student IDs will be much faster, similar to quickly finding a topic in a book using an index.

MongoDB supports various types of indexes, including single field indexes, compound indexes (indexes on multiple fields), multikey indexes (indexes on arrays), text indexes (for full-text search), geospatial indexes (for geospatial queries), and hashed indexes (for hash-based equality matches).

Creating and maintaining indexes comes with a trade-off in terms of storage space and performance impact on write operations. While indexes improve query performance, they require additional disk space to store and can impact the performance of insert, update, and delete operations due to index maintenance overhead.

Aggregation Framework

The Aggregation Framework, sometimes referred to as the Aggregation Pipeline, is a powerful tool for processing data in MongoDB. It allows you to apply a series of operations to your data to achieve specific outcomes.

For instance, suppose you want to analyze the academic progress of students by calculating the average scores per course for the year 2020. This involves filtering documents for the year 2020, grouping them by course, and then calculating the average score for each course.

In our example, we used two key stages in the aggregation pipeline: $match, to filter documents for the year 2020, and $group, to group documents by course and calculate the average score using the avgScore field. The aggregation pipeline acts like a series of interconnected stages, where documents flow through each stage, undergoing transformations until the desired output is obtained.

Each stage in the pipeline can be repeated or combined to perform more complex operations. Common stages include $project, for reshaping documents or selecting specific fields, $sort, for sorting documents, $count, for counting documents and assigning the result to a field, and $merge, for storing the output into another collection.

The Aggregation Framework finds applications in various scenarios. For instance, in an ecommerce application, it can be used to calculate the average sales of a product per country. These capabilities enable the creation of insightful reports, ranging from simple data grouping to complex analytical calculations.

Replication and Sharding

In a typical MongoDB cluster, you’ll find three data-bearing nodes forming what we call a ‘replica set’. Each of these nodes holds identical data copies. Data is primarily written to the Primary node, which then replicates this data to the Secondary nodes.

This replication process creates redundancy, ensuring that even if one server fails, multiple data copies remain intact. This redundancy offers high availability, both during unplanned failures and scheduled maintenance. Speaking of maintenance, it’s best done in a rolling fashion, taking one node offline at a time for updates or upgrades.

It’s important to note that while replication provides fault tolerance, it won’t save you from accidental data deletion. For disaster recovery, backups and restores are crucial.

As your MongoDB usage grows, you might need to scale up your hardware for increased capacity or better performance. If upgrading hardware isn’t feasible, horizontal scaling via Sharding becomes a viable option. Sharding involves partitioning your largest collections across multiple servers, or shards.

Sharding offers several advantages. It boosts query performance by directing queries only to relevant shards, increases data storage capacity beyond a single node’s limit, and allows for geographical data partitioning. This means you can store data closer to your users, enhancing performance and compliance.

MongoDB and Python

Accessing MongoDB from Python is an easy process, thanks to the support provided by the official MongoDB Python driver, pymongo. This driver allows Python developers to interact with MongoDB databases effortlessly, performing various operations such as inserting, querying, updating, and deleting documents.

To begin using MongoDB with Python, you first need to install the pymongo package. You can do this using pip:

pip install pymongo

Once pymongo is installed, you can establish a connection to your MongoDB server using MongoClient, the main entry point for interacting with a MongoDB deployment. Here’s a basic example of how to connect to a MongoDB server:

from pymongo import MongoClient

In this example, we’re connecting to a MongoDB server running on the local machine at the default port 27017. However, you can replace “localhost” and “27017” with the appropriate hostname and port if your MongoDB server is hosted elsewhere.

client = MongoClient("mongodb://localhost:27017/")

Once connected, you can access databases and collections within your MongoDB server using the client object. Here’s how you can select a specific database and collection:

#database
db = client["mydatabase"]

#collection
collection = db["mycollection"]

With the database and collection objects in hand, you can perform various operations on your MongoDB data. For example, you can insert documents into a collection:

#document
document = {"name": "John", "age": 30}
collection.insert_one(document)

You can also query documents using find():

#finding docments
result = collection.find({"name": "John"})
for document in result:
print(document)

Similarly, you can update and delete documents, as well as perform more complex operations using pymongo’s comprehensive API.

DISCLAIMER

Disclaimer: The notes and information presented in this blog post were compiled during the course “Introduction to NoSQL Databases” and are intended to provide an educational overview of the subject matter for personal use.

--

--

Tanja Adžić

Data Scientist and aspiring Data Engineer, skilled in Python, SQL. I love to solve problems.