Skip to main content
  1. CodeBallistix Blog/

Streaming Objects from a Django Instance to Another

1287 words·7 mins
mrinank
django python signals django orm orm object relational model python3 operational log oplog
Author
Mrinank Verma
7 years of total experience, 4 years in Cyber Security, Penetration Testing, and CI/CD. A certified OSCP-PWK (OS-101-51922) Penetration Tester and a certified Practical DevSecOps CDP (CDP91995EE5), who has led Purple Teams, took part in multiple Incidence Responses/Security Audits and architected & developed a CI/CD pipeline for 2 software frameworks. Built major modules for an Automated Malware Analysis System and Endpoint Detection and Response System.
Table of Contents
Django Object Sync - This article is part of a series.
Part 1: This Article

We were working on an online-offline product that required changes done to a local server to be synced to a global central server. Changes could only be made on the local server and a complete view of the database must be available on the global central server. All of this in Django and python3.

This article, in a series of 3, describes how to create an operational log of the additions, changes and deletion to a database using Django. We begin by describing the Task.

TL;DR Summary #

We created a repo for this code on github. If you have no time then head over there and hack it.

Django Objects Sync

Introduction #

Task #

Imagine you have local Django application that saves and stores a lot of information. This information is constantly changing and increasing in size. This information must now be sent to a central server where the same information is to be viewed.

This solves 2 problems:

  1. What if a LAN installed django app requires a view-only client on the internet?
  2. What if you need a Real-time operation-by-operation log that can be used as a backup or streamed to a separate machine? This can be a data availability solution too.

Ideas #

  1. Django Database Backup provides an easy way to backup the database. But does not provide the ability to stream live changes.
  2. Synchronising the underlying databases. The databases could be PostgreSQL, MySQL, or any other that django supports. But this limits interoperability. What if the other instance uses SQLite, which is the default anyway?
  3. Make a custom Operational Log of the changes being made on the database through Django. Stream this Log. This seems interesting.
  4. We can use Django Serializers for translating Django Objects to JSON.

Django Signals #

At the time of creating this solution the major version of Django is 4.2. This version of Django

includes a “signal dispatcher” which helps decoupled applications get notified when actions occur elsewhere in the framework. In a nutshell, signals allow certain senders to notify a set of receivers that some action has taken place. They’re especially useful when many pieces of code may be interested in the same events. 1

Django Serializers #

Django’s serialization framework provides a mechanism for “translating” Django models into other formats. Usually these other formats will be text-based and used for sending Django data over a wire, but it’s possible for a serializer to handle any format (text-based or not). [^2]

So, we can serialize our objects to JSON and ship them through streams or store them in text file.

Deep Dive #

Let us start by writing code for creating the said Operational Log.

We assume that you know enough about django to create a Project and an Application. Let’s create a project named signalsTest, and create an application named polls in it.

Special UUID Key #

for Multiple Instances shipping to a Single Instance. This is not necessary for single writable instance to single readable instance streaming.

Consider Relational References between objects in the Django ORM. Django ORM creates an incremental number primary key by default. All relationships between objects, by default, use this primary key. If we ship this primary key in the Operational Log as is and it is ingested by the single global instance, the relationships could be become invalid in case of primary key overlaps. This is inevitable.

To counter this we created a special UUIDModel which must be inherited by all models that must be shipped to a single instance from multiple instances. In this model we create a special id column typed as a UUIDField, as can be seen below in code provided.

This allows your global application instance to be a single tenant from multiple local source application instances. This is not necessary for single writable instance to single readable instance streaming.

1class UUIDModel(models.Model):
2    pkid = models.BigAutoField(primary_key=True, editable=False)
3    id = models.UUIDField(default=uuid.uuid4, editable=False, unique=True)
4
5    class Meta:
6        abstract = True

Example usage of this model is given in the Django Objects Sync:

14class Question(UUIDModel):
15    question_text = models.CharField(max_length=200)
16    pub_date = models.DateTimeField("date published")
17    def was_published_recently(self):
18        return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
19    def __str__(self):
20        return self.question_text
21    def was_published_recently(self):
22        return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
23
24class Choice(UUIDModel):
25    question = models.ForeignKey(Question, to_field='id', on_delete=models.CASCADE)
26    choice_text = models.CharField(max_length=200)
27    votes = models.IntegerField(default=0)
28    def __str__(self):
29        return self.choice_text

As you can see on the code above, Foreign Key to the Question Model is made using UUID typed id from the UUIDModel.

Django Logging #

We can use the built in Django Logging to create a logging mechanism for the Operational Log. Example for this is given at Django Objects Sync:

126LOGGING = {
127    'version': 1,
128    'disable_existing_loggers': True,
129    'formatters': {
130        'standard': {
131            'format': '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
132        },
133        'oplog_formatter': {
134            'format': '%(created)f %(message)s'
135        },
136    },
137    'handlers': {
138        'default': {
139            'level':'DEBUG',
140            'class':'logging.handlers.RotatingFileHandler',
141            'filename': './logs/app.log',
142            'maxBytes': 1024*1024*5, # 5 MB
143            'backupCount': 5,
144            'formatter':'standard',
145        },
146        'oplog_handler': {
147            'level':'DEBUG',
148            'class':'logging.handlers.RotatingFileHandler',
149            'filename': './logs/oplog.log',
150            'maxBytes': 1024*1024*10, # 10 MB
151            'backupCount': 30,
152            'formatter':'oplog_formatter',
153        },
154    },
155    'loggers': {
156        '': {
157            'handlers': ['default'],
158            'level': 'DEBUG',
159            'propagate': True
160        },
161        'oplogger': { 
162            'handlers': ['oplog_handler'],
163            'level': 'DEBUG',
164            'propagate': False
165        },
166    },
167}

Here we have forced the logging system to provide an epoch timestamp. This is for keeping track of last time we ingested the log.

Operational Log #

This is fairly simple. We need to listen to all post_save and post_delete events, and write to the oplogger logger. In addition we must treat the SAVE and DELETE events separately as they must be ingested differently. To do this we add the string SAVE before the SAVE oplogs, and the string DELETE before the DELETE oplogs.

We create a new file named signals.py in the polls app for this. View the code for this at Django Objects Sync:

 1from django.db.models.signals import post_save, post_delete
 2import polls.models
 3from django.dispatch import receiver
 4from django.core import serializers
 5import logging
 6
 7oplogger = logging.getLogger('oplogger')
 8
 9 
10@receiver(post_save) 
11def model_saved(sender, instance, **kwargs):
12    print(instance.__class__.__name__)
13    if instance.__class__.__name__ == 'ProcessedFile' or instance.__class__.__name__ == 'Migration':
14        return
15    // serializers explained next
16    data = serializers.serialize('json', [instance, ])
17    log = f'SAVE {data}'
18    print(log)
19    oplogger.info(msg=log)
20
21@receiver(post_delete) 
22def model_deleted(sender, instance, **kwargs):
23    print(instance.__class__.__name__)
24    if instance.__class__.__name__ == 'ProcessedFile' or instance.__class__.__name__ == 'Migration':
25        return
26    // serializers explained next
27    data = serializers.serialize('json', [instance, ])
28    log = f'DELETE {data}'
29    print(log)
30    oplogger.info(msg=log)

Demo #

  1. Download or clone the code at Django Objects Sync.

  2. Switch to the blog branch.

  3. Install the pip requirements from the REQUIREMENTS.TXT file.

  4. Run the migrations for django.

  5. Run the Following Commands in django shell after all migrations have been made to verify functionality.

    from polls.models import Choice, Question
    from django.utils  import timezone
    q = Question(question_text="What's new?", pub_date=timezone.now())
    q.save()
    q.choice_set.create(choice_text='Nothing',votes=0)
    c=q.choice_set.create(choice_text='sky',votes=0)
    c.delete()
    c.save()
    
  6. The following output will be created and similar logs will be appended to the file at logs/oplog.log.

    1698395500.971553 SAVE [{"model": "polls.question", "pk": 1, "fields": {"id": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "question_text": "What's new?", "pub_date": "2023-10-27T08:31:40.182Z"}}]
    1698395542.826431 SAVE [{"model": "polls.choice", "pk": 1, "fields": {"id": "d6d2c0a0-6dfa-4515-875e-f1edfd3dbf01", "question": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "choice_text": "Nothing", "votes": 0}}]
    1698395568.857164 SAVE [{"model": "polls.choice", "pk": 2, "fields": {"id": "d8e48b8d-9ccb-4728-b90c-2833592b6fd2", "question": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "choice_text": "sky", "votes": 0}}]
    1698395573.036682 DELETE [{"model": "polls.choice", "pk": 2, "fields": {"id": "d8e48b8d-9ccb-4728-b90c-2833592b6fd2", "question": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "choice_text": "sky", "votes": 0}}]
    1698395577.627465 SAVE [{"model": "polls.choice", "pk": 3, "fields": {"id": "d8e48b8d-9ccb-4728-b90c-2833592b6fd2", "question": "a3f4729c-b3c2-4264-82a6-8ef7abeaec96", "choice_text": "sky", "votes": 0}}]
    

Conclusion #

Ingestion of these logs is fairly simple, and will be dealt with in the next blog in this series.

Django Object Sync - This article is part of a series.
Part 1: This Article