wagtail: Migrating a Block within a StreamField
Wagtail includes a StreamField for freeform page content. This content is stored in blocks, which are JSON-serialized and stored in the database. However, it’s not clear how to migrate a block within a StreamField. If you change a block and then migrate, the migration simply replaces the old block type with the new block type. This is understandable, because Wagtail doesn’t know how to map from instances of the old block to instances of the new block.
I’ve taken the following approach to solving this problem, but would appreciate guidance on how it can be improved. In particular, I’d like to know whether it’s possible to instantiate and serialize a block directly, rather than using the mapping functions I’ve defined below. The operation of StreamValue is also opaque, so I’d appreciate some guidance on this class.
I posted my original question to the Wagtail Developers group.
Consider V1 of the CountryRiskReport model:
class CountryRiskReportPage(Page):
body = StreamField([
('heading', CharBlock()),
('paragraph', RichTextBlock()),
('focusbox', RichTextBlock()),
])
Now consider V2 of the same model:
class FocusBoxBlock(StreamBlock):
heading = CharBlock()
body = StreamBlock([
('paragraph', RichTextBlock()),
])
class CountryRiskReportPage(Page):
body = StreamField([
('heading', CharBlock()),
('paragraph', RichTextBlock()),
('focusbox', FocusBoxBlock()),
])
Notice that in V2, we change focusbox from a RichTextBlock to a FocusBoxBlock. How should we migrate this change?
We seem to need a data migration, not a schema migration. The schema hasn’t changed because the field hasn’t changed: body was a StreamField before and it will be a StreamField after. Consequently, we need two functions that define how to map:
- a serialized rich text block to a serialized focus box block, which will be used by the forwards migration;
- a serialized focus box block to a serialized rich text block, which will be used by the backwards migration.
def richtextblock_to_focusboxblock(block):
return {
'type': 'focusbox',
'value': {
'heading': 'Focus Box',
'body': [{'type': 'paragraph', 'value': block['value']}]
}
}
def focusboxblock_to_richtextblock(block):
heading = '<h1>' + block['value']['heading'] + '</h1>'
body = ''.join([subblock['value'] for subblock in block['value']['body']])
return {
'type': 'focusbox',
'value': heading + body
}
We use the mapping functions when we iterate over a page’s serialized blocks. When we encounter a focusbox, then we use the appropriate function to map from one block to the other. To save us from writing the same code for both the forwards and backwards migrations, we write a function that accepts a page and a mapping function. This returns a list of serialized blocks and a boolean that indicates whether it encountered a focusbox.
def get_stream_data(page, mapper):
stream_data = []
mapped = False
for block in page.body.stream_data:
if block['type'] == 'focusbox':
focusboxblock = mapper(block)
stream_data.append(focusboxblock)
mapped = True
else:
stream_data.append(block)
return stream_data, mapped
We will use this list to create a new StreamValue to replace CountryRiskReportPage.body, which is also a StreamValue. We will use this boolean to determine whether or not to save the CountryRiskReportPage.
def migrate(apps, mapper):
CountryRiskReportPage = apps.get_model('products', 'CountryRiskReportPage')
for page in CountryRiskReportPage.objects.all():
stream_data, mapped = get_stream_data(page, mapper)
if mapped:
stream_block = page.body.stream_block
page.body = StreamValue(stream_block, stream_data, is_lazy=True)
page.save()
All that remains is to define the forwards and backwards migration, as well as the Migration class.
def forwards(apps, schema_editor):
migrate(apps, richtextblock_to_focusboxblock)
def backwards(apps, schema_editor):
migrate(apps, focusboxblock_to_richtextblock)
class Migration(migrations.Migration):
dependencies = [
...
]
operations = [
migrations.RunPython(forwards, backwards),
]
I’ve tested this approach and it works. Nevertheless, I’d appreciate guidance on how it can be improved.
Thanks!
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 12
- Comments: 16 (6 by maintainers)
Okay so the idea in my last comment got me going down the right track. After reading through the source again, it appears that you can save the json directly using the raw_text kwarg. So the line in your function above that reads
page.body = StreamValue(stream_block, stream_data, is_lazy=True)can work independently of the block’s migration state by using raw_text and supplying the json string directly.For example:
This migration now works forwards and backwards for me.
Yes, that’s correct. It’s highly unlikely that we’ll ever change the schema, except for adding new (optional) properties to the dictionary, to be handled by the
StreamField. (In particular, we’re considering adding an ‘id’ property, to assist with tracking changes between revisions.)Thank you Iain,
I just thought it was worth mentioning because it caught me by surprise.
I think I’ll probably design the structures in future to anticipate some possible changes… e.g. always nesting stream blocks in side a Struct block so I can add properties at each level without having to change the nesting.
On 18/11/2016 7:37 PM, Iain Dillingham wrote:
I’ve ran into this issue as well and created a little
migrations.RunPythonfactory-function, which I use in my project now. In case anyone is interested, you’ll find it in this gist.Is this still the recommended way to do data migrations ?
Hi @iaindillingham, This is certainly the most thorough treatment of StreamFIeld migrations I’ve seen! I’d like to use this as the starting point for a new section of the docs, if that’s OK…?
There’s not a whole lot I can add here, as the migration logic here is more in-depth than anything I’ve had to deal with. You’re right that
StreamValueis a bit lacking in documentation at the moment - in principle it’s an internal detail that most users wouldn’t have to work with, but in practice the low-level nature of migrations means that users will encounter it when making these kinds of changes. https://github.com/torchbox/wagtail/wiki/StreamField-blocks-API might fill in some of the gaps, although it was written as a reference documentation for developing StreamField and might be a bit out of date.It should be possible to use the block definition objects (
CharBlock(),RichTextBlock()et al) to unpack the JSON-ish raw data fromstream_datainto friendlier values (so, for example,ImageChooserBlockwould convert an image ID into an image object) - but I can’t see that offering much benefit over manipulating the JSON-ish data directly. Also, by working with the low-level data, you avoid the minefield of having to juggle between ‘before’ and ‘after’ versions of the StreamField definition in order to unpack the old data and re-pack the new data respectively.