wagtail: Migrating a Block within a StreamField

Wagtail includes a StreamField for freeform page content. This content is stored in blocks, which are JSON-serialized and stored in the database. However, it’s not clear how to migrate a block within a StreamField. If you change a block and then migrate, the migration simply replaces the old block type with the new block type. This is understandable, because Wagtail doesn’t know how to map from instances of the old block to instances of the new block.

I’ve taken the following approach to solving this problem, but would appreciate guidance on how it can be improved. In particular, I’d like to know whether it’s possible to instantiate and serialize a block directly, rather than using the mapping functions I’ve defined below. The operation of StreamValue is also opaque, so I’d appreciate some guidance on this class.

I posted my original question to the Wagtail Developers group.

Consider V1 of the CountryRiskReport model:

class CountryRiskReportPage(Page):
    body = StreamField([
        ('heading', CharBlock()),
        ('paragraph', RichTextBlock()),
        ('focusbox', RichTextBlock()),
    ])

Now consider V2 of the same model:

class FocusBoxBlock(StreamBlock):
    heading = CharBlock()
    body = StreamBlock([
        ('paragraph', RichTextBlock()),
    ])

class CountryRiskReportPage(Page):
    body = StreamField([
        ('heading', CharBlock()),
        ('paragraph', RichTextBlock()),
        ('focusbox', FocusBoxBlock()),
    ])

Notice that in V2, we change focusbox from a RichTextBlock to a FocusBoxBlock. How should we migrate this change?

We seem to need a data migration, not a schema migration. The schema hasn’t changed because the field hasn’t changed: body was a StreamField before and it will be a StreamField after. Consequently, we need two functions that define how to map:

  • a serialized rich text block to a serialized focus box block, which will be used by the forwards migration;
  • a serialized focus box block to a serialized rich text block, which will be used by the backwards migration.
def richtextblock_to_focusboxblock(block):
    return {
        'type': 'focusbox',
        'value': {
            'heading': 'Focus Box',
            'body': [{'type': 'paragraph', 'value': block['value']}]
        }
    }

def focusboxblock_to_richtextblock(block):
    heading = '<h1>' + block['value']['heading'] + '</h1>'
    body = ''.join([subblock['value'] for subblock in block['value']['body']])

    return {
        'type': 'focusbox',
        'value': heading + body
    }

We use the mapping functions when we iterate over a page’s serialized blocks. When we encounter a focusbox, then we use the appropriate function to map from one block to the other. To save us from writing the same code for both the forwards and backwards migrations, we write a function that accepts a page and a mapping function. This returns a list of serialized blocks and a boolean that indicates whether it encountered a focusbox.

def get_stream_data(page, mapper):
    stream_data = []
    mapped = False

    for block in page.body.stream_data:
        if block['type'] == 'focusbox':
            focusboxblock = mapper(block)
            stream_data.append(focusboxblock)
            mapped = True

        else:
            stream_data.append(block)

    return stream_data, mapped

We will use this list to create a new StreamValue to replace CountryRiskReportPage.body, which is also a StreamValue. We will use this boolean to determine whether or not to save the CountryRiskReportPage.

def migrate(apps, mapper):
    CountryRiskReportPage = apps.get_model('products', 'CountryRiskReportPage')

    for page in CountryRiskReportPage.objects.all():
        stream_data, mapped = get_stream_data(page, mapper)

        if mapped:
            stream_block = page.body.stream_block
            page.body = StreamValue(stream_block, stream_data, is_lazy=True)
            page.save()

All that remains is to define the forwards and backwards migration, as well as the Migration class.

def forwards(apps, schema_editor):
    migrate(apps, richtextblock_to_focusboxblock)

def backwards(apps, schema_editor):
    migrate(apps, focusboxblock_to_richtextblock)

class Migration(migrations.Migration):
    dependencies = [
        ...
    ]

    operations = [
        migrations.RunPython(forwards, backwards),
    ]

I’ve tested this approach and it works. Nevertheless, I’d appreciate guidance on how it can be improved.

Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 12
  • Comments: 16 (6 by maintainers)

Most upvoted comments

Okay so the idea in my last comment got me going down the right track. After reading through the source again, it appears that you can save the json directly using the raw_text kwarg. So the line in your function above that reads page.body = StreamValue(stream_block, stream_data, is_lazy=True) can work independently of the block’s migration state by using raw_text and supplying the json string directly.

For example:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from itertools import chain
from django.db import migrations, models
from django.core.serializers.json import DjangoJSONEncoder
from wagtail.wagtailcore.blocks.stream_block import StreamValue
from json import dumps

def charblock_to_headingblock(block):
    return {
        'type': 'heading',
        'value': {
            'align': 'center',
            'content': block['value'],
        }
    }

def headingblock_to_charblock(block):
    return {
        'type': 'heading',
        'value': block['value']['content'],
    }

def get_stream_data(obj, mapper):
    stream_data = []
    mapped = False

    for block in obj.free_form_content.stream_data:
        if block['type'] == 'heading':
            headingblock = mapper(block)
            stream_data.append(headingblock)
            mapped = True

        else:
            stream_data.append(block)

    return stream_data, mapped

def handle_object(obj, mapper):
    stream_data, mapped = get_stream_data(obj, mapper)

    if mapped:
        raw_text = dumps(stream_data, cls=DjangoJSONEncoder)
        stream_block = obj.free_form_content.stream_block
        obj.free_form_content = StreamValue(stream_block, [], is_lazy=True, raw_text=raw_text)
        obj.save()

def migrate(apps, mapper):
    FreeFormPage = apps.get_model('cms', 'FreeFormPage')
    Sidebar = apps.get_model('cms', 'Sidebar')
    SidebarSnippet = apps.get_model('cms', 'SidebarSnippet')

    pages = FreeFormPage.objects.all()
    sidebars = Sidebar.objects.all()
    snippets = SidebarSnippet.objects.all()
    for obj in chain(pages, sidebars, snippets):
        handle_object(obj, mapper)

def forwards(apps, schema_editor):
    migrate(apps, charblock_to_headingblock)

def backwards(apps, schema_editor):
    migrate(apps, headingblock_to_charblock)

class Migration(migrations.Migration):

    dependencies = [
        ('cms', '0004_use_heading_block'),
    ]

    operations = [
        migrations.RunPython(forwards, backwards),
    ]

This migration now works forwards and backwards for me.

Yes, that’s correct. It’s highly unlikely that we’ll ever change the schema, except for adding new (optional) properties to the dictionary, to be handled by the StreamField. (In particular, we’re considering adding an ‘id’ property, to assist with tracking changes between revisions.)

Thank you Iain,

I just thought it was worth mentioning because it caught me by surprise.

I think I’ll probably design the structures in future to anticipate some possible changes… e.g. always nesting stream blocks in side a Struct block so I can add properties at each level without having to change the nesting.

On 18/11/2016 7:37 PM, Iain Dillingham wrote:

I don’t see an easy solution, unfortunately @moaxey https://github.com/moaxey. The existing revisions will expect the old blocks, but clearly the new revisions will expect the new blocks. You could possibly iterate over the existing revisions and transform them to the new blocks. However, my experiences of manipulating instances of |Page| and |PageRevision| programatically haven’t been happy! Maybe you could settle for only transforming the draft page revisions (i.e. only those that were saved after the page was last published), as these should reflect the new blocks?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/torchbox/wagtail/issues/2110#issuecomment-261477066, or mute the thread https://github.com/notifications/unsubscribe-auth/AAg35poRJKB5EJwphfWjb5_CkYOM9T0rks5q_WO0gaJpZM4HEu-u.

I’ve ran into this issue as well and created a little migrations.RunPython factory-function, which I use in my project now. In case anyone is interested, you’ll find it in this gist.

Is this still the recommended way to do data migrations ?

Hi @iaindillingham, This is certainly the most thorough treatment of StreamFIeld migrations I’ve seen! I’d like to use this as the starting point for a new section of the docs, if that’s OK…?

There’s not a whole lot I can add here, as the migration logic here is more in-depth than anything I’ve had to deal with. You’re right that StreamValue is a bit lacking in documentation at the moment - in principle it’s an internal detail that most users wouldn’t have to work with, but in practice the low-level nature of migrations means that users will encounter it when making these kinds of changes. https://github.com/torchbox/wagtail/wiki/StreamField-blocks-API might fill in some of the gaps, although it was written as a reference documentation for developing StreamField and might be a bit out of date.

It should be possible to use the block definition objects (CharBlock(), RichTextBlock() et al) to unpack the JSON-ish raw data from stream_data into friendlier values (so, for example, ImageChooserBlock would convert an image ID into an image object) - but I can’t see that offering much benefit over manipulating the JSON-ish data directly. Also, by working with the low-level data, you avoid the minefield of having to juggle between ‘before’ and ‘after’ versions of the StreamField definition in order to unpack the old data and re-pack the new data respectively.