Getting Started with PyMODM¶
Warning
MongoDB has paused the development of PyMODM. If there are any users who want to take over and maintain this project, or if you just have questions, please respond to this forum post.
This document provides a gentle introduction to pymodm
and goes over
everything you’ll need to write your first application.
Installation¶
You can install pymodm
with pip:
pip install pymodm
Of course, you’ll probably want to have a copy of MongoDB itself running, so you can test your app. You can download it for free from www.mongodb.com.
Connecting to MongoDB¶
Now that we have all the components, let’s connect them together. In pymodm
,
you connect to MongoDB by calling the connect()
function:
from pymodm.connection import connect
# Connect to MongoDB and call the connection "my-app".
connect("mongodb://localhost:27017/myDatabase", alias="my-app")
Let’s go through what we just did above. First, we imported the
connect()
method from the connection
module. Then, we established a connection using a MongoDB connection string. A MongoDB
connection string always starts with mongodb://
and can include a multitude
of connection options. It’s important to note that the connection string
provided to connect
must include a database name (“myDatabase”
above). This is the database where all data will reside within your PyMODM
application by default.
Another thing we did when we called connect
is that we provided an alias
for the connection (“my-app”). Although providing an alias is optional, doing so
may come in handy later, if we ever need to refer to the connection by
name. This is useful if we want to have models that use different connection
options, or if we ever want to switch what connection a model is using.
Defining Models¶
Now that we have at least one connection to MongoDB open, we’re ready to define
our model classes. MongoModel
is the base class for all
top-level models, which represent the data we have stored in MongoDB in a
convenient object-oriented way.
Basic Models¶
Typically, the definition of a MongoModel class will include one or more fields
and optionally some metadata, encapsulated in an inner class called
Meta
. Take this example:
from pymongo.write_concern import WriteConcern
from pymodm import MongoModel, fields
class User(MongoModel):
email = fields.EmailField(primary_key=True)
first_name = fields.CharField()
last_name = fields.CharField()
class Meta:
write_concern = WriteConcern(j=True)
connection_alias = 'my-app'
Our model, User
, represents documents in the myDatabase.user
collection
in MongoDB. A few things to notice here:
- Our
User
model extendsMongoModel
. This means that it will get its own collection in the database. Any class that inherits directly from MongoModel always gets its own collection. - We gave
User
three fields:first_name
,last_name
, andemail
. CharField and EmailField always store their values as unicode strings. Theemail
field will also validate its contents as an email address. - In the
Meta
class, we defined a couple pieces of metadata. First, we defined thewrite_concern
attribute, which tells the Model what write concern to use by default. We also set theconnection_alias
, which tells the model what connection to use. In this case, we’re using the connection that we defined earlier, which we gave the name ofmy-app
. Note that we have to callconnect()
with themy-app
alias before using this model, since it relies on themy-app
connection. - We set
primary_key=True
in theemail
field. This means that this field will be used as the id for documents of this MongoModel class. Note that this field will actually be called_id
in the database.
See also
The fields
module.
See also
The list of available metadata attributes.
Models that Reference Other Models¶
Sometimes, our models will need to reference other models. In MongoDB, there are a couple approaches to this:
- We can store the
_id
of the document we want to reference. When we later need the actual document, we can look it up based on this id. If we need to reference multiple documents, we can store these ids in a list. - If we don’t need to query the referenced documents outside of our reference structure, we might just embed such documents directly inside the documents that reference them. Similarly, if we have multiple documents we need to reference, we can just have a list of these embedded documents.
Let’s take a look at a couple examples of some models that reference the
User
model we wrote earlier:
from pymodm import EmbeddedMongoModel, MongoModel, fields
class Comment(EmbeddedMongoModel):
author = fields.ReferenceField(User)
content = fields.CharField()
class Post(MongoModel):
title = fields.CharField()
author = fields.ReferenceField(User)
revised_on = fields.DateTimeField()
content = fields.CharField()
comments = fields.EmbeddedModelListField(Comment)
Here we’ve defined two additional model types: Comment
and Post
. These
two models demonstrate the two approaches discussed earlier: both Comment
and Post
have an author, which is a User
model. The User
that
represents the author in each case is stored among all the other Users in the
myDatabase.user
collection. In Comment
and Post
models, we’re just
storing the _id
of the User
in the author
field. This is actually
the same as the User’s email
field, since we set primary_key=True
for
that field earlier.
Post
gets a little more interesting. In order to support commenting on a
Post
, what we’ve done is added a comments
field, which is an
EmbeddedModelListField
. This represents the second
approach we discussed, where Comment
objects are embedded directly into our
Post
object. The downside to doing this is that it is difficult to
query for individual Comment
objects. The upside is that we won’t have to
make an additional query to retrieve all the comments associated with a given
Post
.
Deleted References¶
Now that we’ve defined models that reference other model types, we face another
challenge: what happens if a User
object is deleted? If one of our beloved
authors decides to quit the commenting/posting scene, what is to become of their
comments and posts? pymodm
gives us a few options:
- Do nothing (this is the default behavior)
- Change fields that reference the deleted object to
None
. - Cascade the deletes: when a referenced object is deleted, recursively delete all objects that were referencing it.
- Don’t allow deleting objects that still have references to them.
- If the deleted object was just one among potentially many other references stored in a list, remove the reference from this list.
In our case for the Comment
and Post
objects, let’s delete any comments
and posts associated with a User
after they’re gone. This would be the
changed definition of the author
field in each case:
author = fields.ReferenceField(User, on_delete=ReferenceField.CASCADE)
See also
The ReferenceField
class.
Creating Data¶
Alright, now that we’ve defined models for each MongoDB collection our app will use, let’s create some documents!
Saving a Single Instance¶
Here’s one way to set up our first User:
User('user@email.com', 'Bob', 'Ross').save()
Above, we used positional arguments to construct an instance of
User
. Positional arguments are assigned to fields in the order they were
defined in the User
class. We can also use keyword arguments or a mix of
positional/keyword arguments to create MongoModel instances, so this would be
equivalent:
User('user@email.com', last_name='Ross', first_name='Bob').save()
Finally, calling save()
on the instance persists it to
the database.
Saving Instances in Bulk¶
We can also save documents to the database in bulk:
users = [
User('user@email.com', 'Bob', 'Ross'),
User('anotheruser@email.com', 'David', 'Attenborough')
]
User.objects.bulk_create(users)
Updating Documents¶
There are two ways to update documents in MongoDB with pymodm
:
- Change instance attributes to be the way we like, then call
save()
on the instance. - Use the
update()
method on the MongoModel’sQuerySet
.
Let’s say that we have an instance that looks like this:
post = Post(author=some_author, content='This is the first post!').save()
Now we realize that we forgot to set the revised_on
date on the
post… oops. Let’s fix that by setting the attribute directly per option (1)
above:
import datetime
# Set the revised_on attribute of our Post from earlier.
post.revised_on = datetime.datetime.now()
# Save the revised document.
post.save()
Note that we have to call save()
in order to save any
changes we’ve made to a MongoModel. Setting the attribute just changes its value
on our local copy of the document.
The above update strategy works well if we just want to change this single
document. But what if we wanted to update documents in bulk or take advantage of
a particular MongoDB update operator? The second
option grants us more flexibility: we can use the
update()
method on the MongoModel’s
QuerySet
:
Post.objects.raw({'revised_on': {'$exists': False}}).update(
{'$set': {'revised_on': datetime.datetime.now()}})
We’ll discuss QuerySet objects in more detail in the Accessing Data section.
Accessing Data¶
We’ve seen how to model the data in our database and how to create some
documents, so now it’s time to query some of this data. Our primary way of
getting to our data happens through the QuerySet
class, which can be accessed through the objects
attribute on our Model
class. Here’s how we could list all the Users we have, for example:
for user in User.objects.all():
print(user.first_name + ' ' + user.last_name)
We can do the same thing with Post
objects. Let’s narrow our search to posts
that were revised within the last month:
import datetime
month_ago = datetime.datetime.now() - datetime.timedelta(days=30)
for post in Post.objects.raw({'revised_on': {'$gte': month_ago}}):
print(post.title + ' by ' + post.author.first_name)
See what we did there? We accessed the first_name
attribute on the User
object, even though only the id of the User is technically stored in the
author
field on a Post. When we access the data stored in a
ReferenceField
, it is dereferenced automatically. This
makes a separate query to the database. If we didn’t want that to happen, we
would need to use the no_auto_dereference()
context manager:
from pymodm.context_managers import no_auto_dereference
# Turn off automatic dereferencing for fields defined on "Post".
with no_auto_dereference(Post):
for post in Post.objects.raw({'revised_on': {'$gte': month_ago}}):
print(post.title + ' by author with id ' + post.author)
Querying Model Subclasses¶
Earlier, we mentioned that every class that inherits directly from
MongoModel
gets its own collection in the database. But what
about classes that inherit from some other model class?
class ImagePost(Post):
image = fields.ImageField()
The above model subclasses the Post
model we wrote earlier. Because it does
not inherit directly from MongoModel, it does not have its own
collection. Instead, it shares a collection among all the other Post
objects. However, we are still able to distinguish between different types when
querying the database:
for image_post in ImagePost.objects.all():
assert isinstance(image_post, ImagePost)
for post in Post.objects.all():
if isinstance(post, ImagePost):
print('image: ' + repr(post.image))
print('post content: ' + post.content)
How does this work? For every model class that allows inheritance, pymodm
creates another, hidden field called _cls
that stores the class of the model
that the document refers to. This way, models of different types can be
collocated in the same collection while preserving type information.
What if we don’t want this _cls
field to be stored in our documents? This is
possible by declaring the model to be final, which means that it has to
inherit directly from MongoModel and cannot be extended:
class PageTheme(MongoModel):
theme_name = fields.CharField()
background_color = fields.CharField()
foreground_color = fields.CharField()
class Meta:
final = True
Advanced: Managers and Custom QuerySets¶
We can do a lot with just the tools the default
QuerySet
object provides, but sometimes we may find
the need for specialized collection-level functionality, or we might want to
write a shortcut for a very common query that we’re performing on one or more
models.
Let’s revisit our Post
model and add a field called published
. This will
tell us whether the Post has been published or not. Most of the time, we’ll
probably just want to work with those Post objects that have already been
published, but it’s going to get annoying fast if we have to include
{"published": True}
with every query.
class Post(MongoModel):
title = fields.CharField()
author = fields.ReferenceField(User)
revised_on = fields.DateTimeField()
content = fields.CharField()
comments = fields.EmbeddedModelListField(Comment)
published = fields.BooleanField(default=False)
There are two ways we can easily access only those Posts which aren’t drafts:
- Create a new
QuerySet
class that has a methodpublished
that filtersPost
objects for ones that have been published. - Create a new
Manager
class that always creates instances ofQuerySet
that have the filter{"published": True}
already applied. This would be handy if we only ever cared about Posts that have been published.
We’ll discuss each approach in turn.
Custom QuerySets¶
Let’s take a look at the first approach, using a custom QuerySet class:
from pymodm.queryset import QuerySet
class PublishedPostQuerySet(QuerySet):
def published(self):
'''Return all published Posts.'''
return self.raw({"published": True})
Now that we’ve defined a QuerySet that has the published
method, we need to
hook it up with a Manager
class so that we can easily
use this QuerySet
type from our model:
from pymodm.manager import Manager
# Create the new Manager class.
PublishedPostManager = Manager.from_queryset(PublishedPostQuerySet)
class Post(MongoModel):
title = fields.CharField()
author = fields.ReferenceField(User)
revised_on = fields.DateTimeField()
content = fields.CharField()
comments = fields.EmbeddedModelListField(Comment)
published = fields.BooleanField(default=False)
# Change the "objects" manager to use our own Manager, which returns
# instances of PublishedPostQuerySet:
objects = PublishedPostManager()
# Get all published Posts.
published_posts = Post.objects.published()
Custom Managers¶
Now let’s examine the second approach, where all QuerySet
instances already
have their {"published": True}
query applied.
When we call a QuerySet method from a Manager, as in Post.objects.all()
, the
all()
method is proxied through the objects
Manager. The first thing the Manager
does in this case
is get a QuerySet instance by calling its own
get_queryset()
method, then it applies
whatever operation was called on the Manager.
What this means for us is that we can override
get_queryset()
to do anything we want to this
QuerySet instance before it’s returned. Any future operations we do with that
QuerySet will have these operations already applied.
The first thing we need to do is subclass Manager
:
class PostManager(Manager):
def get_queryset(self):
# Override get_queryset() to apply our filter, so that any
# QuerySet method we call through the Manager already has our query
# applied.
return super(PostManager, self).get_queryset().raw(
{"published": True})
Then, as before, we add this Manager to their MongoModel:
class Post(MongoModel):
title = fields.CharField()
author = fields.ReferenceField(User)
revised_on = fields.DateTimeField()
content = fields.CharField()
comments = fields.EmbeddedModelListField(Comment)
published = fields.BooleanField(default=False)
# Change the "objects" manager to use our own PostManager.
objects = PostManager()
# Get all published Posts.
published_posts = Post.objects.all()
Of course, we can add whatever other methods we wish to our custom Manager, and they don’t all have to return QuerySets. For example, we might define a Manager method to do some complex aggregation:
from collections import OrderedDict
class PostManager(Manager):
def get_queryset(self):
# Override get_queryset() to apply our filter, so that any
# QuerySet method we call through the Manager already has our query
# applied.
return super(PostManager, self).get_queryset().raw(
{"published": True})
def comment_counts(self):
'''Get a map of title -> # comments for each Post.'''
aggregates = self.model.objects.aggregate(
{'$project': {'title': 1, 'comments': {'$size': '$comments'}}},
{'$sort': {'comments': -1}}
)
return OrderedDict((agg['title'], agg['comments'])
for agg in aggregates)
Now we can see easily what Posts have the most comments:
>>> comment_counts = Post.objects.comment_counts()
>>> print(comment_counts)
OrderedDict([
('Getting Started with PyMODM', 9237),
('Custom QuerySets and Managers', 423)
])
What’s Next?¶
Congratulations! You’ve read through the Getting Started guide and understand the basics of writing an application using PyMODM. For a more detailed reference of tools that come with PyMODM, check out the API documentation.