MongoDB Basics
What is MongoDB?
MongoDB is a document database. What it means is that your data does not have to conform to any schema. In contrary to relational databases where records not having some attribute are expected to set a NULL value in that column, in document databases it is totally acceptable.
How MongoDB stores data?
MongoDB stores documents in JSON-like format called BSON. However, JSON is used not only for storing and retrieving the documents. It is all over the MongoDB API, so you will be using JSON for writing queries as well.
Differences in naming between relational and document DBs
In case you are already familiar with some relational database (like Oracle) it is much easier to wrap your head around MongoDB when comparing concepts between this 2 types of DBs.
Below is a table showing how a concept in Oracle database matches a parallel concept in the MongoDB realm.
Oracle | MongoDB |
---|---|
schema | database |
table | collection |
row | document |
ROWID | _id |
Basics
When connecting to MongoDB using the mongo client the first thing you need to do is select the database with the use command:
Insert
To insert a new document into a collection you can use the insert command:
Supported data types
Generally, because MongoDB uses BSON under the hood, we can use JSON flavored data types:
Type | Example |
---|---|
Number (ints, longs, doubles) | 1234, 56.78 |
String | “First post” |
Object | { “name”: “Sam” } |
Array | [ 1, 2, 3 ] |
ObjectId | ObjectId(“…”) |
Boolean | true/false |
Date | new Date(“2016-03-25”) |
Timestamp | new Timestamp() |
Null | null |
MongoDB validations
When changing data MongoDB does some simple validations:
- it checks whether _id in the document is unique (no other document has the same _id)
- there are no syntax errors
- document is less than 16mb
Find all
To find every document stored inside the posts collection simply call find without parameters:
If you look closer, you will find out that MongoDB has generated a unique identifier for your documents:
Find with query
To search for documents matching a specific criteria we can add an additional parameter:
Query embedded data
How about a case when we would like to find all documents where “Bob” has left a comment? Suppose that our posts collection looks like this:
To write a query matching an embedded document we can use dot-notation:
Embedding vs referencing
The obvious advantage of embedding is that you can fetch embedded documents in one query. Otherwise, you would have to fetch referenced documents in multiple queries. Additionally, with embedded documents you get atomic writes, whereas with reference documents we have to do some extra work to ensure consistency.
To embed or to not embed?
You have to ask yourself a series of questions:
- how often is the data used together (often -> go for embedded)
- how many documents are expected to be embedded (more than a few hundred -> you should reconsider referencing)
- how often will the embedded document change (often -> go for referencing)
The general rule of thumb is to start with embedding and move to referencing when you need to access documents independently or you start to get very large sets of embedded documents.
Remove
To remove documents you can simply use the remove command:
Update
As expected to update a document you can use the update command:
Watch out here! This way you would replace the whole document with the following content:
What you probably expected is this result:
If what you really wanted is to only replace the content property then you should use this operation instead:
Another caveat is that this command will change only the first matching document. If you want to update all the matching documents use multi option instead:
Update operators
There are other useful operators you can use to modify your data. To name just a few:
Operator | Before | Operation | After |
---|---|---|---|
$inc | { “quantity”: 2 } | { “$inc”: { “quantity”: 3 }} | { “quantity”: 5 } |
$mul | { “price”: 2.2 } | { “$mul”: { “price”: 2.5 }} | { “price”: 5.5 } |
$max | { “highest”: 6 } | { “$max”: { “highest”: 9 }} | { “highest”: 9 } |
{ “highest”: 6 } | { “$max”: { “highest”: 5 }} | { “highest”: 6 } | |
$addToSet | { “letters”: [“a”] } | { “$addToSet”: { “letters”: [“b”] }} | { “letters”: [“a”, “b”] } |
{ “letters”: [“a”] } | { “$addToSet”: { “letters”: [“a”] }} | { “letters”: [“a”] } | |
$unset | { “a”: 1, “b”: 2 } | { “$unset”: { “b”: “” }} | { “a”: 1 } |
$rename | { “oldy”: “val” } | { “$rename”: { “oldy”: “newy” }} | { “newy”: “val” } |
$pop | { “arr”: [1, 2, 3] } | { “$pop”: { “arr”: 1 }} | { “arr”: [1, 2] } |
{ “arr”: [1, 2, 3] } | { “$pop”: { “arr”: -1 }} | { “arr”: [2, 3] } | |
$push | { “letters”: [“a”] } | { “$push”: { “letters”: [“a”] }} | { “letters”: [“a”, “a”] } |
$pull | { “arr”: [2, 2, 1, 3, 1] } | { “$pull”: { “arr”: 1 }} | { “arr”: [2, 2, 3] } |
Upsert
Be aware that when you try to update a document that does not yet exist nothing will be modified and your update will be lost:
To fix this you can use the upsert option:
This way, if query does not match any document MongoDB will insert a new document using the values from the query and the update section. It will result in a document such as:
Advanced query operators
Since now we have used only the simplest equality operator, however there are more:
- $gt - greater than
- $lt - less than
- $gte - greater than or equals
- $lte - less than or equals
- $ne - not equals
Here is a command for finding all posts with more than 3 code snippets:
Did you know that you can specify multiple query operators?
Suppose that each post can be rated by our readers, i.e.:
Lets find a post that was at least once well rated:
Projections
Sometimes for the performance sake you would like to filter out the fields you do not want to see in your result. To do that you have 2 options:
- you either specify the fields you want to exclude
- …or you provide the fields you only want to be included
Here is an example returning only titles of the blog posts:
This query will return only title and _id fields:
This is the only case when we can use both inclusion and exclusion to get rid of _id field:
Count
How to count the total number of blog posts? You can use the count() method:
Sort
What about sorting the posts alphabetically by title? sort() method may come handy:
To reverse the order to be descending you can replace 1 with -1 as in:
Pagination
We can page results using limit() and skip() methods. In order to fetch the first 10 posts we can execute this command:
For the next page you should use skip():
Aggregation
In relational databases we have this nice concept of aggregation using GROUP BY together with HAVING and a number of aggregation functions like AVG, SUM, etc. It happens that MongoDB implements this concept using aggregate() method.
As an example lets fetch the average blog posts ratings per year to see, if our blog is getting better:
For this set of documents:
…we would receive:
Aggregation pipeline
What is cool about MongoDB is that we can build an aggregation pipeline where we can split our query into a sequence of stages:
Each stage can be one of:
- $group - as shown above for grouping documents
- $match - for filtering documents (similar to WHERE and HAVING in SQL)
- $sort - for sorting
- $limit - for narrowing the number of documents
- $project - for narrowing the number of fields in documents
As our final example lets assume we are publishing sponsored posts on our blog. Let us create aggregation pipeline showing top 5 sponsors with the best-rated blog posts about Java:
What is MongoDB good for?
According to NoSQL Distilled by Pramod J. Sadalage and Martin Fowler there are some common use cases for MongoDB:
- event logging - as a central database for events in the whole app (storing unstructured data and scalability are MongoDB’s advantages)
- CMS and blogging - a blog together with comments can be retrieved in one query
- e-commerce - products in an online shop have various attributes which makes relational database not very suitable for this purpose (no schema acts in favor of MongoDB)
- website monitoring - you can fetch queries in real-time
Where to go next?
If you want to get your hands dirty Code School has a very nice course on MongoDB.
However, if reading is your preferred way of learning you also have multiple good options. As an introductory material the previously mentioned NoSQL Distilled is a great starting point. If you want to dig deeper MongoDB: The Definitive Guide by Kristina Chodorow would be your best bet.
Final words
I hope you enjoyed this article and you now have a grasp of what MongoDB is and how can it be used.