Streaming Objects from a Django Instance to Another
Table of Contents
Django Object Sync - This article is part of a series.
We were working on an online-offline product that required changes done to a local server to be synced to a global central server. Changes could only be made on the local server and a complete view of the database must be available on the global central server. All of this in Django and python3.
This article, in a series of 3, describes how to create an operational log of the additions, changes and deletion to a database using Django. We begin by describing the Task.
TL;DR Summary #
We created a repo for this code on github. If you have no time then head over there and hack it.
Introduction #
Task #
Imagine you have local Django application that saves and stores a lot of information. This information is constantly changing and increasing in size. This information must now be sent to a central server where the same information is to be viewed.
This solves 2 problems:
- What if a LAN installed django app requires a view-only client on the internet?
- What if you need a Real-time operation-by-operation log that can be used as a backup or streamed to a separate machine? This can be a data availability solution too.
Ideas #
- Django Database Backup provides an easy way to backup the database. But does not provide the ability to stream live changes.
- Synchronising the underlying databases. The databases could be PostgreSQL, MySQL, or any other that django supports. But this limits interoperability. What if the other instance uses SQLite, which is the default anyway?
- Make a custom Operational Log of the changes being made on the database through Django. Stream this Log. This seems interesting.
- We can use Django Serializers for translating Django Objects to JSON.
Django Signals #
At the time of creating this solution the major version of Django is 4.2. This version of Django
includes a “signal dispatcher” which helps decoupled applications get notified when actions occur elsewhere in the framework. In a nutshell, signals allow certain senders to notify a set of receivers that some action has taken place. They’re especially useful when many pieces of code may be interested in the same events. 1
Django Serializers #
Django’s serialization framework provides a mechanism for “translating” Django models into other formats. Usually these other formats will be text-based and used for sending Django data over a wire, but it’s possible for a serializer to handle any format (text-based or not). [^2]
So, we can serialize our objects to JSON and ship them through streams or store them in text file.
Deep Dive #
Let us start by writing code for creating the said Operational Log.
We assume that you know enough about django to create a Project
and an Application. Let’s create a project named signalsTest
,
and create an application named polls
in it.
Special UUID Key #
for Multiple Instances shipping to a Single Instance. This is not necessary for single writable instance to single readable instance streaming.
Consider Relational References between objects in the Django ORM. Django ORM creates an incremental number primary key by default. All relationships between objects, by default, use this primary key. If we ship this primary key in the Operational Log as is and it is ingested by the single global instance, the relationships could be become invalid in case of primary key overlaps. This is inevitable.
To counter this we created a special UUIDModel
which must be
inherited by all models that must be shipped to a single instance
from multiple instances. In this model we create a special id
column typed as a UUIDField
, as can be seen below in code provided.
This allows your global application instance to be a single tenant from multiple local source application instances. This is not necessary for single writable instance to single readable instance streaming.
1class UUIDModel(models.Model):
2 pkid = models.BigAutoField(primary_key=True, editable=False)
3 id = models.UUIDField(default=uuid.uuid4, editable=False, unique=True)
4
5 class Meta:
6 abstract = True
Example usage of this model is given in the Django Objects Sync:
14class Question(UUIDModel):
15 question_text = models.CharField(max_length=200)
16 pub_date = models.DateTimeField("date published")
17 def was_published_recently(self):
18 return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
19 def __str__(self):
20 return self.question_text
21 def was_published_recently(self):
22 return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
23
24class Choice(UUIDModel):
25 question = models.ForeignKey(Question, to_field='id', on_delete=models.CASCADE)
26 choice_text = models.CharField(max_length=200)
27 votes = models.IntegerField(default=0)
28 def __str__(self):
29 return self.choice_text
As you can see on the code above, Foreign Key to the Question
Model is made using
UUID
typed id
from the UUIDModel
.
Django Logging #
We can use the built in Django Logging to create a logging mechanism for the Operational Log. Example for this is given at Django Objects Sync:
126LOGGING = {
127 'version': 1,
128 'disable_existing_loggers': True,
129 'formatters': {
130 'standard': {
131 'format': '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
132 },
133 'oplog_formatter': {
134 'format': '%(created)f %(message)s'
135 },
136 },
137 'handlers': {
138 'default': {
139 'level':'DEBUG',
140 'class':'logging.handlers.RotatingFileHandler',
141 'filename': './logs/app.log',
142 'maxBytes': 1024*1024*5, # 5 MB
143 'backupCount': 5,
144 'formatter':'standard',
145 },
146 'oplog_handler': {
147 'level':'DEBUG',
148 'class':'logging.handlers.RotatingFileHandler',
149 'filename': './logs/oplog.log',
150 'maxBytes': 1024*1024*10, # 10 MB
151 'backupCount': 30,
152 'formatter':'oplog_formatter',
153 },
154 },
155 'loggers': {
156 '': {
157 'handlers': ['default'],
158 'level': 'DEBUG',
159 'propagate': True
160 },
161 'oplogger': {
162 'handlers': ['oplog_handler'],
163 'level': 'DEBUG',
164 'propagate': False
165 },
166 },
167}
Here we have forced the logging system to provide an epoch timestamp. This is for keeping track of last time we ingested the log.
Operational Log #
This is fairly simple. We need to listen to all post_save
and post_delete
events, and write to the oplogger
logger.
In addition we must treat the SAVE
and DELETE events separately as they must be ingested differently.
To do this we add the string SAVE
before the SAVE oplogs, and
the string DELETE
before the DELETE oplogs.
We create a new file named signals.py
in the polls
app for this. View the code for this at
Django Objects Sync:
1from django.db.models.signals import post_save, post_delete
2import polls.models
3from django.dispatch import receiver
4from django.core import serializers
5import logging
6
7oplogger = logging.getLogger('oplogger')
8
9
10@receiver(post_save)
11def model_saved(sender, instance, **kwargs):
12 print(instance.__class__.__name__)
13 if instance.__class__.__name__ == 'ProcessedFile' or instance.__class__.__name__ == 'Migration':
14 return
15 // serializers explained next
16 data = serializers.serialize('json', [instance, ])
17 log = f'SAVE {data}'
18 print(log)
19 oplogger.info(msg=log)
20
21@receiver(post_delete)
22def model_deleted(sender, instance, **kwargs):
23 print(instance.__class__.__name__)
24 if instance.__class__.__name__ == 'ProcessedFile' or instance.__class__.__name__ == 'Migration':
25 return
26 // serializers explained next
27 data = serializers.serialize('json', [instance, ])
28 log = f'DELETE {data}'
29 print(log)
30 oplogger.info(msg=log)
Demo #
Download or clone the code at Django Objects Sync.
Switch to the
blog
branch.Install the
pip
requirements from theREQUIREMENTS.TXT
file.Run the migrations for django.
Run the Following Commands in django shell after all migrations have been made to verify functionality.
from polls.models import Choice, Question from django.utils import timezone q = Question(question_text="What's new?", pub_date=timezone.now()) q.save() q.choice_set.create(choice_text='Nothing',votes=0) c=q.choice_set.create(choice_text='sky',votes=0) c.delete() c.save()
The following output will be created and similar logs will be appended to the file at logs/oplog.log.
1698395500.971553 SAVE [{"model": "polls.question", "pk": 1, "fields": {"id": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "question_text": "What's new?", "pub_date": "2023-10-27T08:31:40.182Z"}}] 1698395542.826431 SAVE [{"model": "polls.choice", "pk": 1, "fields": {"id": "d6d2c0a0-6dfa-4515-875e-f1edfd3dbf01", "question": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "choice_text": "Nothing", "votes": 0}}] 1698395568.857164 SAVE [{"model": "polls.choice", "pk": 2, "fields": {"id": "d8e48b8d-9ccb-4728-b90c-2833592b6fd2", "question": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "choice_text": "sky", "votes": 0}}] 1698395573.036682 DELETE [{"model": "polls.choice", "pk": 2, "fields": {"id": "d8e48b8d-9ccb-4728-b90c-2833592b6fd2", "question": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "choice_text": "sky", "votes": 0}}] 1698395577.627465 SAVE [{"model": "polls.choice", "pk": 3, "fields": {"id": "d8e48b8d-9ccb-4728-b90c-2833592b6fd2", "question": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "choice_text": "sky", "votes": 0}}]
Conclusion #
Ingestion of these logs is fairly simple, and will be dealt with in the next blog in this series.