Conrad Fox

django wagtail 

Coarse Programmers guide to faking new Wagtail rich text entities

Dec. 3, 2022

I have begun creating online courses using the Django based CMS Wagtail. Like Django and the Python language in which they are written, I find it easy to understand. This is ideal for a coarse programmer who doesn't have time or interest to be staying on top of constantly changing targets.

Unfortunately, the backend admin site of Wagtail is becoming more and more JS-heavyas I discovered when I tried to extend the rich text editor for one of my courses. My need was simple enough. I wanted a version of the hyperlink button, but with the ability to add a few extra attributes, such as the title of the page and publisher.


The attributes could be used to create a sidebar with bibliographical references. You can see it in action on the course I'm co-producing on Coastal Resilience for Journalists. (Click the link icons in the text to see the sidebar).


This seems like a simple need. It took me an hour to do something like this with CKEditor. But the Wagtail docs are pretty clear that this is not to be taken lightly:

"This is an advanced feature. Please carefully consider whether you really need this...."
"You will most likely need to write a hefty dose of JavaScript, some of it with React."
"...consider implementing your UI through StreamField instead, which has a battle-tested API meant for Django developers."

Well, I considered carefully and was pretty sure I needed it. But there was no way I was going to venture into React-land after that warning. I just don't have the time. Here's how I faked it.

The models and fields

I call my page model Module (because each page is a module in an online course). The main content of each module is stored in a Wagtail StreamField called content.

# home/

from wagtail.models import Page
from wagtail.fields import StreamField
from wagtail import blocks
from wagtail.images.blocks import ImageChooserBlock

class Module(Page):

	content = StreamField([
		("heading", blocks.CharBlock(template= "blocks/heading.html")),
		("paragraph", ModuleRichTextBlock(editor="full")),
		("quiz", blocks.CharBlock(template="blocks/quiz.html")),
		("image", ImageChooserBlock()),
	], use_json_field=True)

The content field is composed of various Wagtail StreamBlocks. The one we're concerned with is paragraph. Paragraph is a subclass of a Wagtail RichTextBlock. That is where I wanted to add the custom editor button. We'll return to it shortly.

# home/

from wagtail import blocks

class ModuleRichTextBlock(blocks.RichTextBlock):

	def render(self, value, context=None):
                   # to be revealed

I also have a snippet model to contain the url and extra attributes for each link.

# home/

class Link(models.Model):

	# Fields created automatically by sync_links management command
	module = models.ForeignKey("Module", on_delete=models.CASCADE, blank=True, null=True)
	lid = models.IntegerField(blank=True, null=True) #lid = link id
	url = models.URLField(unique=True)

	# Extra info added later by hand
	page_title = models.CharField(max_length=150, blank=True, null=True)
	source = models.CharField(max_length=150, blank=True, null=True)
	description = models.CharField(max_length=300, blank=True, null=True)

	def __str__(self):
		return self.source or self.url

	class Meta:
		ordering = ["lid", "id"]

Each link has a ForeignKey to the module it appears in. The lid (link id) field is used for sorting the links in the order they appear in the page.

Management command

Then I created a django management command called This goes through the paragraph field of each page searching for links. It creates a new Link object for any new URLs it finds, with a ForeignKey to the Module (ie. page) it was found on.

# management/commands/

from json import dumps
import re
import requests

from bs4 import BeautifulSoup as bs

from import BaseCommand, CommandError

from wagtail.core.fields import StreamField

from home.models import Module, Link

class Command(BaseCommand):
    help = 'Creates Link objects from new entries in module content.'

    def add_arguments(self, parser):
        parser.add_argument('mids', nargs='*', type=int)

    def handle(self, *args, **options):
        mids = options["mids"]

        # optinally indicate which module you want to sync, 
        # otherwise go through them all
        if not mids:
            modules = Module.objects.all()
            modules = Module.objects.filter(id__in=mids)

        for module in modules:

           # module.content is a StreamField field that contains all the text for the module
           # get_prep_value converts it to a list of dicts, containing the content as html
           # plus other bits and bobs
            content = module.content.get_prep_value()

            urls = []

            # Loop through each block searching for links
            for block in content:
                if block["type"] == "paragraph":
                    value = block["value"] # the content as html
                    soup = bs(value)
                    link_tags = soup.find_all("a")
                    for tag in link_tags:
                        hrefs = [tag["href"] for tag in link_tags if tag.get("href", None)]

            for i, url in enumerate(urls):

                link, flag = Link.objects.get_or_create(url=url)
                # If the link already exists, the url does not 
                # need to be updated. Neither do the lid or module
                # if the link has not changed its position in the 
                # text or the module it appears in. But its simpler just
                # to set them anyway. 
                link.url = url 
                link.lid = i
                link.module = module

I'm using BeautifulSoup to find the link tags. It's probably overkill. I could have just used a regex, but my head hurts every time I consult the python regex guide.

I can run any time I add links to a module or change their position in a page (for example if I change the order of paragraphs containing links). Any prexisting links will be updated and duplicates will not be created. There are many loopholes in this implementation. For example, it won't delete Link objects from the database if the corresponding hyperlinks are deleted from the text. But for my limited usage, this suffices.

After I run this command, I get a list of link snippets in the admin.

Screenshot from 2022-12-04 11-41-22

I can then add page titles and publisher at my leisure. I realized that this is actually a bit more comfortable than trying to add the info when I'm in the throes of writing, so maybe I actually didn't need to create that Rich Text Editor button. I should have considered harder.

Rendering the references

To create the sidebar with the list of references for the page, I override the render method for the ModuleRichTextBlock.

# home/ 

from bs4 import BeautifulSoup as bs
from wagtail import blocks

class ModuleRichTextBlock(blocks.RichTextBlock):

	def render(self, value, context=None):

		# Loop through the <a> tags, find them in Link.objects, and assign them
		# an id so they can be matched with the sidebar links
		source = value.source

		soup = bs(source)
		link_tags = soup.find_all("a")
		if link_tags:
			for tag in link_tags:
				url = tag.get("href", None)
				if url:
					link, flag = Link.objects.get_or_create(url=url, module=context["page"])
					# tag.unwrap()
					marker = soup.new_tag("i")
					marker["class"] ="link fa-solid fa-link"
					marker["id"] = "link-" + str(

		value.source = str(soup)
		return super().render(value, context)

This intercepts the rendering, removing the link from the enclosed text, adding a font-awesome icon after the text and an id to the icon that corresponds to the link object in the database. Once again I use BeautifulSoup, where probably a regex would do.

I then override the get_context method on the Module page itself to include the Link objects for that Module.

# home/ 
from wagtail.models import Page

class Module(Page):


	def get_context(self, request, *args, **kwargs):
		context = super().get_context(request, *args, **kwargs)
		context['links'] = Link.objects.filter(module=self)
		return context

The links are rendered in the template in a sidebar that slides in and out when the link icon is clicked. A very simple bit of jquery checks the id on the link and scrolls to the corresponding reference in the list.

I doubt this is the best way to achieve what I need, but it was the fastest way I could think of without getting distracted from my real task, which is creating the actual content. I do know that instead of using a custom render method to modify the links in the html every time the page is loaded, I could save those changes to the database in the same command that creates the Link objects. This page has a good example of how to do this.

However, doing this entails messing around with Wagtail's revisions models and the distinction between drafts and published pages. I don't understand these models, so for now I'm sticking with what I've got.

More information
  • A demonstration of how to create a simple Wagtail Rich Text entity the proper way by a Wagtail contributor.
  • Advice on creating a complex inline entity... basically, don't. Previous posts have good tutorials on creating custom blocks.

I'm a journalist, producer, instructor and developer. I've reported from a minefield and a dugout canoe, taught radio journalism, robotics, and anthropology in board rooms, lecture halls and mud huts, led award-winning media projects, and mentored dozens of journalists in Canada and Latin America. Today, I write about climate change and develop an app to learn English while you run.

Digital storytelling

(Conrad's) vision, his journalistic nose, and his tireless search to go one more step for the new, helped us find really impactful stories...

Manuel Ureste

Journalist and co-author of La Estafa Maestra, Animal Politico, Mexico