hydroshare: invalid XML characters present in abstracts
@martinseul @aphelionz @dtarb
When running check_bag on all resources, the resource below crashed on bag creation, because there are invalid characters in the resource abstract when creating resourcemetadata.xml. Thus the bag for this resource cannot be created. The only way that this could happen is if the UI and/or REST API allowed the resource abstract to contain invalid characters. This was tested on cuahsi-dev-2.hydroshare.org using an image of the production system.
bag bags/1a7dc5d6b9fa4253bca442341eef500d.zip NOT FOUND
metadata_dirty is True
bag_modified is True
Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 354, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 346, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python2.7/site-packages/django/core/management/base.py", line 394, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python2.7/site-packages/django/core/management/base.py", line 445, in execute
output = self.handle(*args, **options)
File "/hydroshare/hs_core/management/commands/check_bag.py", line 194, in handle
check_bag(r.short_id, options)
File "/hydroshare/hs_core/management/commands/check_bag.py", line 48, in check_bag
create_bag_files(resource)
File "/hydroshare/hs_core/hydroshare/hs_bagit.py", line 92, in create_bag_files
out.write(resource.get_metadata_xml())
File "/hydroshare/hs_core/models.py", line 1949, in get_metadata_xml
include_format_elements=include_format_elements)
File "/hydroshare/hs_core/models.py", line 3935, in get_xml
dcterms_abstract.text = self.description.abstract
File "src/lxml/lxml.etree.pyx", line 1031, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:51697)
File "src/lxml/apihelpers.pxi", line 711, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:23071)
File "src/lxml/apihelpers.pxi", line 699, in lxml.etree._createTextNode (src/lxml/lxml.etree.c:22931)
File "src/lxml/apihelpers.pxi", line 1439, in lxml.etree._utf8 (src/lxml/lxml.etree.c:30219)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
I have modified check_bag to watch for this error and am running it again on all resources to check the extent of the bug.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 20 (20 by maintainers)
@horsburgh @dtarb @aphelionz @martinseul Try http://cuahsi-dev-2.hydroshare.org now.
This was in the spirit of the discussion on the call yesterday.
\nor\n\nas appropriate.\n\nto paragraph break.\nto<br>