neuralcoref: Can't serialize document

I can save a Spacy document to disk but not one produced by neuralcoref. For example, the following snippet returns error TypeError: can't serialize My sister: [My sister, She].

import spacy

nlp0 = spacy.load('en_core_web_sm')
doc0 = nlp0(u'My sister has a dog. She loves him.')
with open(f'output/test0.pkl', 'wb') as f:
    f.write(doc0.to_bytes())

nlp = spacy.load('en_coref_sm')
doc = nlp(u'My sister has a dog. She loves him.')
with open(f'output/test.pkl', 'wb') as f:
    f.write(doc.to_bytes())

The files produced are as follows:

$ ls -lh output/*.pkl
-rw-r--r--  1 cumeo  staff     0B Aug 11 21:16 output/test.pkl
-rw-r--r--  1 cumeo  staff    16K Aug 11 21:16 output/test0.pkl

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 6
  • Comments: 15 (1 by maintainers)

Most upvoted comments

I ran into the same issue when running nlp.pipe with multiple processes:

for doc in nlp.pipe(df.text, batch_size=5, n_process=4):
    print(doc)

Since this is actively blocking me, I found a temporary workaround:

    def remove_unserializable_results(doc):
        doc.user_data = {}
        for x in dir(doc._):
            if x in ['get', 'set', 'has']: continue
            setattr(doc._, x, None)
        for token in doc:
            for x in dir(token._):
                if x in ['get', 'set', 'has']: continue
                setattr(token._, x, None)
        return doc

nlp.add_pipe(remove_unserializable_results, last=True)

I added this after my last pipeline (i.e. after='coreference_resolver') which converted the coreferences into entities so I no longer needed the coref metadata which was unserializable.

doc.user_data = {}

Can you please provide a more complete example. I use your code snippet but unfortunately I have no access to the coref data.

I’ve had this issue too, while trying to call doc_bytes = doc.to_bytes()